Top
Best
New

Posted by simonw 20 hours ago

2025: The Year in LLMs(simonwillison.net)
809 points | 437 comments
ksec 12 hours ago|
All these improvement in a single year, 2025. While this may seem obvious to those who follows along the AI / LLM news. It may be worth pointing out again ChatGPT was introduced to us in November 2022.

I still dont believe AGI, ASI or Whatever AI will take over human in short period of time say 10 - 20 years. But it is hard to argue against the value of current AI, which many of the vocal critics on HN seems to have the opinion of. People are willing to pay $200 per month, and it is getting $1B dollar runway already.

Being more of a Hardware person, the most interesting part to me is the funding of all the developments of latest hardware. I know this is another topic HN hate because of the DRAM and NAND pricing issue. But it is exciting to see this from a long term view where the pricing are short term pain. Right now the industry is asking, we have together over a trillion dollar to spend on Capex over the next few years and will even borrow more if it needs to be, when can you ship us 16A / 14A / 10A and 8A or 5A, LPDDR6, Higher Capacity DRAM at lower power usage, better packaging, higher speed PCIe or a jump to optical interconnect? Every single part of the hardware stack are being fused with money and demand. The last time we have this was Post-PC / Smartphone era which drove the hardware industry forward for 10 - 15 years. The current AI can at least push hardware for another 5 - 6 years while pulling forward tech that was initially 8 - 10 years away.

I so wished I brought some Nvidia stock. Again, I guess no one knew AI would be as big as it is today, and it is only just started.

wpietri 6 hours ago||
This is not a great argument:

> But it is hard to argue against the value of current AI [...] it is getting $1B dollar runway already.

The psychic services industry makes over $2 billion a year in the US [1], with about a quarter of the population being actual believers. [2].

[1] The https://www.ibisworld.com/united-states/industry/psychic-ser...

[2] https://news.gallup.com/poll/692738/paranormal-phenomena-met...

ctoth 3 hours ago|||
2022/2023: "It hallucinates, it's a toy, it's useless."

2024/2025: "Okay, it works, but it produces security vulnerabilities and makes junior devs lazy."

2026 (Current): "It is literally the same thing as a psychic scam."

Can we at least make predictions for 2027? What shall the cope be then! Lemme go ask my psychic.

bopbopbop7 2 hours ago||
2022/2023: "Next year software engineering is dead"

2024: "Now this time for real, software engineering is dead in 6 months, AI CEO said so"

2025: "I know a guy who knows a guy who built a startup with an LLM in 3 hours, software engineering is dead next year!"

What will be the cope for you this year?

aspenmartin 3 minutes ago||
The cope + disappointment will be knowing that a large population of HN users will paint a weird alternative reality. There are a multitude of messages about AI that are out there, some are highly detached from reality (on the optimistic and pessimistic side). And then there is the rational middle, professionals who see the obvious value of coding agents in their workflow and use them extensively (or figure out how to best leverage them to get the most mileage). I don't see software engineering being "dead" ever, but the nature of the job _has already changed_ and will continue to change. Look at Sonnet 3.5 -> 3.7 -> 4.5 -> Opus 4.5; that was 17 months of development and the leaps in performance are quite impressive. You then have massive hardware buildouts and improvements to stack + a ton of R&D + competition to squeeze the juice out of the current paradigm (there are 4 orders of magnitude of scaling left before we hit real bottlenecks) and also push towards the next paradigm to solve things like continual learning. Some folks have opted not to use coding agents (and some folks like yourself seem to revel in strawmanning people who point out their demonstrable usefulness). Not using coding agents in Jan 2026 is defensible. It won't be defensible for long.
apexalpha 6 hours ago|||
What if these provide actual value through placebo-effect?
wpietri 5 hours ago|||
I think we have different definitions of "actual value". But even if I pick the flaccid definition, that isn't proof of value of the thing itself, but of any placebo. In which case we can focus on the cheapest/least harmful placebo. Or, better, solving the underlying problem that the placebo "helps".
computably 4 hours ago||
I'll preface by saying I fully agree that psychics aren't providing any non-placebo value to believers, although I think it's fine to provide entertainment for non-believers.

> Or, better, solving the underlying problem that the placebo "helps".

The underlying problems are often a lack of a decent education and a generally difficult/unsatisfying life. Systemic issues which can't be meaningfully "solved" without massive resources and political will.

jay_kyburz 1 hour ago||
Actually, I'd go one step further and say they are harmful to everybody else.

It might just be my circles, but I've seen Carl Sagans quote everywhere in the last couple of months.

"“Science is more than a body of knowledge; it is a way of thinking. I have a foreboding of an America in my children’s or grandchildren’s time—when the United States is a service and information economy; when nearly all the key manufacturing industries have slipped away to other countries; when awesome technological powers are in the hands of a very few, and no one representing the public interest can even grasp the issues; when the people have lost the ability to set their own agendas or knowledgeably question those in authority; when, clutching our crystals and nervously consulting our horoscopes, our critical faculties in decline, unable to distinguish between what feels good and what’s true, we slide, almost without noticing, back into superstition and darkness.”"

recursive 6 hours ago|||
You talking about psychics or LLMs?
grosswait 6 hours ago||
Yes
jillesvangurp 10 hours ago|||
2025 was the year of development tool using AI agents. I think we'll shift attention to non development tool using AI agents. Most business users are still stuck using chat gpt as some kind of grand oracle that will write their email or powerpoint slides. There are bits and pieces of mostly technology demo level solutions but nothing that is widely used like AI coding tools are so far. I don't think this is bottle necked on model quality.

I don't need an AGI. I do need a secretary type agent that deals with all the simple but yet laborious non technical tasks that keep infringing on my quality engineering time. I'm CTO for a small startup and the amount of non technical bullshit that I need to deal with is enormous. Some examples of random crap I deal with: figuring out contracts, their meaning/implication to situations, and deciding on a course of action; Customer offers, price calculations, scraping invoices from emails and online SAAS accounts, formulating detailed replies to customer requests, HR legal work, corporate bureaucracy, financial planning, etc.

A lot of this stuff can be AI assisted (and we get a lot of value out of ai tools for this) but context engineering is taking up a non trivial amount of my time. Also most tools are completely useless at modifying structured documents. Refactoring a big code base, no problem. Adding structured text to an existing structured document, hardest thing ever. The state of the art here is an ff-ing sidebar that will suggest you a markdown formatted text that you might copy/paste. Tool quality is very primitive. And then you find yourself just stripping all formatting and reformatting it manually. Because the tools really suck at this.

arcatech 5 hours ago|||
> Some examples of random crap I deal with: figuring out contracts, their meaning/implication to situations, and deciding on a course of action

This doesn’t sound like bullshit you should hand off to an AI. It sounds like stuff you would care about.

jillesvangurp 4 hours ago|||
I do care about it; kind of my duty as a co-founder. Which is why I'm spending double digit percentages of my time doing this stuff. But I absolutely could use some tools to cut down on a lot of the drudgery that is involved with this. And me reading through 40 pages of dense legal German isn't one of my strengths since I 1) do not speak German 2) am not a lawyer and 3) am not necessarily deeply familiar with all the bureaucracy, laws, etc.

But I can ask intelligent questions about that contract from an LLM (in English) and shoot back and forth a few things, come up with some kind of action plan, and then run it by our laywers and other advisors.

That's not some kind of hypothetical thing. That's something that happened multiple times in our company in the last few months. LLMs are very empowering for dealing with this sort of thing. You still need experts for some stuff. But you can do a lot more yourself now. And as we've found out, some of the "experts" that we relied on in the past actually did a pretty shoddy job. A lot of this stuff was about picking apart the mess they made and fixing it.

As soon as you start drafting contracts, it gets a lot harder. I just went through a process like that as well. It involves a lot of manual work that is basically about formatting documents, drafting text, running pdfs and text snippets through chat gpt for feedback, sparring, criticism, etc. and iterating on that. This is not about vibe coding some contract but making sure every letter of a contract is right. That ultimately involves lawyers and negotiating with other stakeholders but it helps if you come prepared with a more or less ready to sign off on document.

It's not about handing stuff off but about making LLMs work for you. Just like with coding tools. I care about code quality as well. But I still use the tools to save me a lot of time.

simonw 4 hours ago||
One of the lessons I learned running a startup is that it doesn't matter how good the professionals you hire are for things like legal and accounting, you still need to put work in yourself.

Everyone makes mistakes and misses things, and as the co-founder you have to care more about the details than anyone else does.

I would have loved to have weird-unreliable-paralegal-Claude available back when I was doing that!

nrclark 5 hours ago|||
Agree. Even asking it can anchor your thinking.
topaztee 4 hours ago|||
`Also most tools are completely useless at modifying structured documents`

we built a tool for this for the life science space and are opening it up to the general public very soon. Email me I can give you access (topaz at vespper dot com)

utopiah 11 hours ago|||
> All these improvement in a single year

> hard to argue against the value of current AI

> People are willing to pay $200 per month, and it is getting $1B dollar runway already.

Those are 3 different things. There can be a LOT of fast and significant improvements but still remain extremely far from the actual goal, so far it looks like actually little progress.

People pay for a lot of things, including snake oil, so convincing a lot of people to pay a bit is not in itself a proof of value, especially when some people are basically coerced into this, see how many companies changed their "strategy" to mandating AI usage internally, or integration for a captive audience e.g. Copilot.

Finally yes, $1B is a LOT of money for you and I... but for the largest corporations it's actually not a lot. For reference Google earned that in revenue... per day in 2023. Anyway that's still a big number BUT it still has to be compared with, well how much does OpenAI burn. I don't have any public number on that but I believe the consensus is that it's a lot. So until we know that number we can't talk about an actual runway.

HumblyTossed 1 hour ago|||
It's a great tool, but right now it's only being used to feed the greed.

>> Again, I guess no one knew AI would be as big as it is today, and it is only just started.

People have been saying similar about self driving cars for years now. "AI" is another one of those expensive ideas that we'll get 85% of the way there and then to get the other 15% will be way more expensive than anyone will want to pay for. It's already happening - HW prices and electricity - people are starting to ask, "if I put more $ into this machine, when am I actually going to start getting money out?" The "true believers" are like, soon! But people are right to be hugely skeptical.

jliptzin 33 minutes ago||
There are some things it's really great at. For example, handling a css layout. If we have to spend trillions of dollars and get nothing else out of it other than being able to vertically center a <div> without wrestling with css and wanting to smash the keyboard in the process, it will all have been worth it.
pjc50 10 hours ago|||
Investing a trillion dollars for a revenue of a billion dollars doesn't sound great yet.
steveBK123 6 hours ago||
Indeed, its the old Uber playbook at nearly two extra orders of magnitude.

It is a large enough number to simply run out of private capital to consume before it turns cash flow positive.

Lots of things sell well if sold at such a loss. I’d take a new Ferrari for $2500 if it was on offer.

pjc50 44 minutes ago|||
Did Uber actually do a lot of capital investment? They don't own the cars, for example.
simonw 22 minutes ago||
I believe they spent a huge amount of money on incentives to help sign up drivers, and discounts to help attract customers.
aoeusnth1 1 hour ago||||
You say that as if Uber's playbook didn't work. Try this: https://www.google.com/finance/quote/UBER:NYSE
derwiki 5 hours ago|||
Uber’s playbook worked for Uber
coffeebeqn 12 hours ago|||
Seems like Nvidia will be focusing on the super beefy GPUs and leaving the consumer market to a smaller player
Flow 10 hours ago|||
I don't get why Nvidia can't do both? Is it because of the limited production capabilities of the factories?
ACCount37 9 hours ago||
Yes. If you're bottlenecked on silicon and secondaries like memory, why would you want to put more of those resources into lower margin consumer products if you could use those very resources to make and sell more high margin AI accelerators instead?

From a business standpoint, it makes some sense to throttle the gaming supply some. Not to the point of surrendering the market to someone else probably, but to a measurable degree.

ksec 7 hours ago||
We will have to wait and see but my bet is that Nvidia will move to Leading Edge node N2 earlier now they have the Margin to work with. Both Hopper and Blackwell were too late in the design cycle. The AI hype and continue to buy the latest and great leaving Gaming at a mainstream node.

Nvidia using Mainstream node has always been the norm considering most Fab capacity always goes to Mobile SoC first. But I expect the internet / gamers will be angry anyway because Nvidia does not provide them with the latest and greatest.

In reality the extra R&D cost for designing with leading edge will be amortised by all the AI order which give Nvidia competitive advantage at the consumer level when they compete. That is assuming there are competition because most recent data have shown Nvidia owning 90%+ of discreet market share, 9% for AMD and 1% for Intel.

_s 11 hours ago|||
AMD owns a lot of the consumer market already; handhelds, consoles, desktop rigs and mobile ... they are not a small player.
utopiah 11 hours ago||
They said "smaller" not small.
Atomic_Torrfisk 4 hours ago|||
> People are willing to pay $200 per month

Some people are of course, but how many?

> ... People are willing to pay $200 per month

This is just low-key hype. Careful with your portfolio...

chias 11 hours ago|||
These are not all improvements. Listed:

* The year of YOLO and the Normalization of Deviance

* The year that Llama lost its way

* The year of alarmingly AI-enabled browsers

* The year of the lethal trifecta

* The year of slop

* The year that data centers got extremely unpopular

Y_Y 1 hour ago|||
Not that YOLO, PJ Reddie released that in 2015
mbesto 4 hours ago||||
Said differently - the year we start to see all of the externalities of a globally scaled hyped tech trend.
steveBK123 6 hours ago|||
> * The year that data centers got extremely unpopular

I was discussing the political angle with a friend recently. I think Big Tech Bro / VC complex has done themselves a big disservice by aligning so tightly with MAGA to the point AI will be a political issue in 2026 & 2028.

Think about the message they’ve inadvertently created themselves - AI is going to replace jobs, it’s pushing electric prices up, we need the government to bail us out AND give us a regulatory light touch.

Super easy campaign for Dems - big tech trumpers are taking your money, your jobs, causing inflation, and now they want bailouts !!

ACCount37 9 hours ago|||
Is the AI progress in 2025 an outstanding breakthrough? Not really. It's impressive but incremental.

Still, the gap between the capabilities of a cutting edge LLM and that of a human is only this wide. There are only this many increments it takes to cross it.

belter 52 minutes ago|||
>> But it is hard to argue against the value of current AI, which many of the vocal critics on HN seems to have the opinion of.

What is the concrete business case? Can anyone point to a revenue producing company using AI in production, and where AI is a material driver of profits?

Tool vendors don’t count. I’m not interested in how much money is being made selling shovels...show me a miner who actually struck gold please.

tstrimple 10 hours ago||
[flagged]
cherryteastain 9 hours ago|||
Sam Altman [1] certainly seems to talk about AGI quite a bit

[1] https://blog.samaltman.com/reflections

ACCount37 9 hours ago|||
Honestly, I wouldn't be surprised if a system that's an LLM at its core can attain AGI. With nothing but incremental advances in architecture, scaffolding, training and raw scale.

Mostly the training. I put less and less weight on "LLMs are fundamentally flawed" and more and more of it on "you're training them wrong". Too many "fundamental limitations" of LLMs are ones you can move the needle on with better training alone.

The foundation of LLM is flexible and capable, and the list of "capabilities that are exclusive to human mind" is ever shrinking.

tim333 4 hours ago|||
They seem to be missing a bit on learning as you go and thinking about things and getting new insights.
HarHarVeryFunny 4 hours ago|||
That depends on how you define AGI - it's a meaningless term to use since everyone uses it to mean different things. What exactly do you mean ?!

Yes, there is a lot that can be improved via different training, but at what point is it no longer a language model (i.e. something that auto-regressively predicts language continuations)?

I like to use an analogy to the children's "Stone Soup" story whereby a "stone soup" (starting off as a stone in a pot of boiling water) gets transformed into a tasty soup/stew by strangers incrementally adding extra ingredients to "improve the flavor" - first a carrot, then a bit of beef, etc. At what point do you accept that the resulting tasty soup is not in fact stone soup?! It's like taking an auto-regressively SGD-trained Transformer, and incrementally tweaking the architecture, training algorithm, training objective, etc, etc. At some point it becomes a bit perverse to choose to still call it a language model

Some of the "it's just training" changes that would be needed to make today's LLMs more brain-like may be things like changing the training objective completely from auto-regressive to predicting external events (with the goal of having it be able to learn the outcomes of it's own actions, in order to be able to plan them), which to be useful would require the "LLM" to then be autonomous and act in some (real/virtual) world in order to learn.

Another "it's just training" change would be to replace pre/mid/post-training with continual/incremental runtime learning to again make the model more brain-like and able to learn from it's own autonomous exploration of behavior/action and environment. This is a far more profound, and ambitious, change than just fudging incremental knowledge acquisition for some semblance of "on the job" learning (which is what the AI companies are currently working on).

If you put these two "it's just training/learning" enhancements together then you've now got something much more animal/human-like, and much more capable than an LLM, but it's already far from a language model - something that passively predicts next word every time you push the "generate next word" button. This would now be an autonomous agent, learning how to act and control/exploit the world around it. The whole pre-trained, same-for-everyone, model running in the cloud, would then be radically different - every model instance is then more like an individual learning based on it's own experience, and maybe you're now paying for compute for the continual learning compute rather than just "LLM tokens generated".

These are "just" training (and deployment!) changes, but to more closely approach human capability (but again, what to you mean by "AGI"?) there would also need to be architectural changes and additions to the "Transformer" architecture (add looping, internal memory, etc), depending on exactly how close you want to get to human/animal capability.

losvedir 21 minutes ago||
I predict 2026 will be the year of the first AI Agent "worm" (or virus?). Kind of like the Morris worm running amok as an experiment gone wrong, I think we will sometime soon have someone set up an AI agent whose core loop is to try to propagate itself, either as an experiment or just for the lulz.

The actual Agent payload would be very small, likely just a few hundred line harness plus system prompt. It's just a question of whether the agent will be skilled enough to find vulnerabilities to propagate. The interesting thing about an AI worm is that it can use different tricks on different hosts as it explores its own environment.

If a pure agent worm isn't capable enough, I could see someone embedding it on top of a more traditional virus. The normal virus would propagate as usual, but it would also run an agent to explore the system for things to extract or attack, and to find easy additional targets on the same internal network.

A main difference here is that the agents have to call out to a big SotA model somewhere. I imagine the first worm will simply use Opus or ChatGPT with an acquired key, and part of it will be trying to identify (or generate) new keys as it spreads.

Ultimately, I think this worm will be shut down by the model vendor, but it will have to have made a big enough splash beforehand to catch their attention and create a team to identify and block keys making certain kinds of requests.

I'd hope OpenAI, Anthropic, etc have a team and process in place already to identify suspicious keys, eg, those used from a huge variety of IPs, but I wouldn't be surprised if this were low on their list of priorities (until something like this hits).

andai 11 hours ago||
Re: yolo mode

I looked into docker and then realized the problem I'm actually trying to solve was solved in like 1970 with users and permissions.

I just made a agent user limited to its own home folder, and added my user to its group. Then I run Claude code etc as the agent user.

So it can only read write /home/agent, and it cannot read or write my files.

I add myself to agent group so I can read/write the agent files.

I run into permission issues sometimes but, it's pretty smooth for the most part.

Oh also I gave it root to a $3 VPS. It's so nice having a sysadmin! :) That part definitely feels a bit deviant though!

andai 44 minutes ago||
Re: yolo mode

https://markdownpastebin.com/?id=1ef97add6ba9404b900929ee195...

My notes from back when I set this up! Includes instructions for using a GUI file explorer as the agent user. As well as setting up a systemd service to fix the permissions automatically.

(And a nice trick which shows you which GUI apps are running as which user...)

However, most of these are just workarounds for the permission issue I kept running into, which is that Claude Code would for some reason create files with incorrect permissions so that I couldn't read or write those files from my normal account.

If someone knows how to fix that, or if someone at Anthropic is reading, then most of this Rube Goldberg machine becomes unnecessary :)

staeff777 2 hours ago|||
I really like this idea and just tried some steps for myself. create user with homedir: sudo useradd -m agent add myself to agent group: sudo usermod -a -G agent $USER

Allow agent group to agent home dir: sudo chmod -R 770 /home/agent

Start a new shell with the group (or login/logoff): newgrp agent Now you should be able to change into the agent home.

Allow your user to sudo as agent: echo "$USER ALL=(agent) NOPASSWD: ALL" |sudo tee -a /etc/sudoers.d/$USER-as-agent now you can start your agent using sudo: sudo -u agent your_agent

works nice.

jillesvangurp 10 hours ago|||
I use a qemu vm for running codex cli in yolo mode and use simple ssh based git operations for getting code in and out of there. Works great. And you can also do fun things like let it loose on multiple git projects in one prompt. The vm can run docker as well which helps with containerized tests and other more complicated things. One thing I've started to observe is that you spend more time waiting for tool execution than for model inference. So having a fast local vm is better than a slower remote one.
some_developer 8 hours ago|||
Docker in docker, with opencode.

Opencode plus some scripts on host and in its container works well to run yolo and only see what it needs (via mounting). Has git tools but can't push etc. is thought how to run tests with the special container-in-container setup.

Including pre-configured MCPs, skills, etc.

The best part is that it just works for everyone on the team, big plus.

knicholes 4 hours ago||
cgroups and namespaces
ogou 17 hours ago||
This is a good tooling survey of the past year. I have been watching it as a developer re-entering the job market. The job descriptions closely parallel the timeline used in the post. That's bizarre to me because these approaches are changing so fast. I see jobs for "Skill and Langchain experts with production-grade 0>1 experience. Former founders preferred". That is an expertise that is just a few months old and startups are trying to build whole teams overnight with it. I'm sure January and February will have job postings for whatever gets released that week. It's all so many sand castles.
weatherlite 13 hours ago|
> Skill and Langchain experts with production-grade 0>1 experience.

Also , it's just normal backend work - calling a bunch of APIs. What am I missing here?

XenophileJKO 11 hours ago|||
That is like saying training tensorflow models is just calling some APIs.

Actually making a system like this work seems easy, but isn't really.

(Though with the CURRENT generation or two of models it has gotten "pretty easy" I think. Before that, not so much.)

weatherlite 7 hours ago||
No idea about training tenserflow models - is it super complex or is it just calling a couple of APIs ? Langchain is literally calling an API. Maybe you need to get good with prompting or whatever, but I don't see where the complexity lies. Please let me know.
andy99 5 hours ago||
Having used both Tensorflow (though I expect they mean PyTorch which is way more popular, and I have also used) and langchain, they are nothing alike.

They he ML frameworks are much closer to implementing the mathematics of neural networks, with some abstractions but much closer to the linear algebra level. It requires an understanding of the underlying theory.

Langchain is a suite of convenience functions for composing prompts to LLMs. I wouldn’t consider there to be some real domain knowledge one would need to use it. There is a learning curve but it’s about learning the different components rather than learning a whole new academic discipline.

HarHarVeryFunny 3 hours ago||
There's a big difference between building an ML framework like Tensorflow or PyTorch (I built a Lua Torch-like one in C++ myself) and just using it to build/train a model.

Building the model may range from very simple if you are just recreating a standard architecture, or be a research endeavor if you are designing something completely new.

The difficulty/complexity of then training the model depends on what it is. For something simple like a CNN for image recognition, it's really just a matter of selecting a few hyperparameters and letting it rip. At the other end of the spectrum you've got LLMs where training (and coping with instabilities) is something of a black art, with RL training completely different from pre-training, and there is also the issue of designing/discovering a pre/mid/post training curriculum.

But anyways, the actual training part can be very simple, not requiring too much knowledge of what's going on under the hood, depending on the model.

walthamstow 11 hours ago|||
Buzzwords.
waldrews 19 hours ago||
Remember, back in the day, when a year of progress was like, oh, they voted to add some syntactic sugar to Java...
nrhrjrjrjtntbt 15 hours ago||
More like 6 different new nosql databases and js frameworks.
dotancohen 13 hours ago||
A Wordpress zero day and Linux not on the desktop. Netcraft confirms it.
crystal_revenge 15 hours ago|||
That must have been a long time back. Having lived through the time when web pages were served through CGI and mobile phones only existed in movies, when SVMs where the new hotness in ML and people would write about how weird NNs were, I feel like I've seen a lot more concrete progress in the last few decades than this year.

This year honestly feels quite stagnant. LLMs are literally technology that can only reproduce the past. They're cool, but they were way cooler 4 years ago. We've taken big ideas like "agents" and "reinforcement learning" and basically stripped them of all meaning in order to claim progress.

I mean, do you remember Geoffrey Hinton's RBM talk at Google in 2010? [0] That was absolutely insane for anyone keeping up with that field. By the mid-twenty teens RBMs were already outdated. I remember when everyone was implementing flavors of RNNs and LSTMs. Karpathy's character 2015 RNN project was insane [1].

This comment makes me wonder if part of the hype around LLMs is just that a lot of software people simply weren't paying attention to the absolutely mind-blowing progress we've seen in this field for the last 20 years. But even ignoring ML, the world's of web development and mobile application development have gone through incredible progress over the last decade and a half. I remember a time when JavaScript books would have a section warning that you should never use JS for anything critical to the application. Then there's the work in theorem provers over the last decade... If you remember when syntactic sugar was progress, either you remember way further back than I do, or you weren't paying attention to what was happening in the larger computing world.

0. https://www.youtube.com/watch?v=VdIURAu1-aU

1. https://karpathy.github.io/2015/05/21/rnn-effectiveness/

HarHarVeryFunny 33 minutes ago|||
> LLMs are literally technology that can only reproduce the past.

That's incorrect on many levels. They are drawing upon, and reproducing, language patterns from "the past", but they are combining those patterns in ways that may have never have been seen before. They may not be truly creative, but they are still capable of generating novel outputs.

> They're cool, but they were way cooler 4 years ago.

Maybe this year has been more about incremental progress with LLMs than the shock/coolness factor of talking to an LLM for the first time, but the utility of them, especially for programming, has dramatically increased this year, really in the last 6 months.

The improvement in "AI" image and video generation has also been impressive, to the point now that fake videos on YouTube can often only be identified as such by common sense rather that the fact that they don't look real.

Incremental improvement can often be more impressive that innovation, whose future importance can be hard to judge when it first appears. How many people read "Attention is all you need" in 2017 and thought "Wow! This is going to change the world!". Not even the authors of the paper thought that.

handoflixue 15 hours ago||||
> LLMs are literally technology that can only reproduce the past.

Funny, I've used them to create my own personalized text editor, perfectly tailored to what I actually want. I'm pretty sure that didn't exist before.

It's wild to me how many people who talk about LLM apparently haven't learned how to use them for even very basic tasks like this! No wonder you think they're not that powerful, if you don't even know basic stuff like this. You really owe it to yourself to try them out.

crystal_revenge 14 hours ago|||
> You really owe it to yourself to try them out.

I've worked at multiple AI startups in lead AI Engineering roles, both working on deploying user facing LLM products and working on the research end of LLMs. I've done collaborative projects and demos with a pretty wide range of big names in this space (but don't want to doxx myself too aggressively), have had my LLM work cited on HN multiple times, have LLM based github projects with hundreds of stars, appeared on a few podcasts talking about AI etc.

This gets to the point I was making. I'm starting to realize that part of the disconnect between my opinions on the state of the field and others is that many people haven't really been paying much attention.

I can see if recent LLMs are your first intro to the state of the field, it must feel incredible.

CamperBob2 14 hours ago|||
That's all very impressive, to be sure. But are you sure you're getting the point? As of 2025, LLMs are now very good at writing new code, creating new imagery, and writing original text. They continue to improve at a remarkable rate. They are helping their users create things that didn't exist before. Additionally, they are now very good at searching and utilizing web resources that didn't exist at training time.

So it is absurdly incorrect to say "they can only reproduce the past." Only someone who hasn't been paying attention (as you put it) would say such a thing.

windexh8er 12 hours ago|||
> They are helping their users create things that didn't exist before.

That is a derived output. That isn't new as in: novel. It may be unique but it is derived from training data. LLMs legitimately cannot think and thus they cannot create in that way.

ordersofmag 6 hours ago|||
I will find this often-repeated argument compelling only when someone can prove to me that the human mind works in a way that isn't 'combining stuff it learned in the past'.

5 years ago a typical argument against AGI was that computers would never be able to think because "real thinking" involved mastery of language which was something clearly beyond what computers would ever be able to do. The implication was that there was some magic sauce that human brains had that couldn't be replicated in silicon (by us). That 'facility with language' argument has clearly fallen apart over the last 3 years and been replaced with what appears to be a different magic sauce comprised of the phrases 'not really thinking' and the whole 'just repeating what it's heard/parrot' argument.

I don't think LLM's think or will reach AGI through scaling and I'm skeptical we're particularly close to AGI in any form. But I feel like it's a matter of incremental steps. There isn't some magic chasm that needs to be crossed. When we get there I think we will look back and see that 'legitimately thinking' wasn't anything magic. We'll look at AGI and instead of saying "isn't it amazing computers can do this" we'll say "wow, was that all there is to thinking like a human".

windexh8er 5 hours ago|||
> 5 years ago a typical argument against AGI was that computers would never be able to think because "real thinking" involved mastery of language which was something clearly beyond what computers would ever be able to do.

Mastery of words is thinking? In that line of argument then computers have been able to think for decades.

Humans don't think only in words. Our context, memory and thoughts are processed and occur in ways we don't understand, still.

There's a lot of great information out there describing this [0][1]. Continuing to believe these tools are thinking, however, is dangerous. I'd gather it has something to do with logic: you can't see the process and it's non-deterministic so it feels like thinking. ELIZA tricked people. LLMs are no different.

[0] https://archive.is/FM4y8 [0] https://www.theverge.com/ai-artificial-intelligence/827820/l... [1] https://www.raspberrypi.org/blog/secondary-school-maths-show...

CamperBob2 3 hours ago||
Mastery of words is thinking?

That's the crazy thing. Yes, in fact, it turns out that language encodes and embodies reasoning. All you have to do is pile up enough of it in a high-dimensional space, use gradient descent to model its original structure, and add some feedback in the form of RL. At that point, reasoning is just a database problem, which we currently attack with attention.

No one had the faintest clue. Even now, many people not only don't understand what just happened, but they don't think anything happened at all.

ELIZA, ROFL. How'd ELIZA do at the IMO last year?

meindnoch 47 minutes ago||
So people without language cannot reason? I don't think so.
CamperBob2 25 minutes ago||
There's no such thing as people without language, except for infants and those who are so mentally incapacitated that the answer is self-evidently "No, they cannot."

Language is the substrate of reason. It doesn't need to be spoken or written, but it's a necessary and (as it turns out) sufficient component of thought.

arcatech 5 hours ago|||
> I will find this often-repeated argument compelling only when someone can prove to me that the human mind works in a way that isn't 'combining stuff it learned in the past'.

This is the definition of the word ‘novel’.

Kerrick 12 hours ago||||
That is a pedantic distinction. You can create something that didn't exist by combining two things that did exist, in a way of combining things that already existed. For example, you could use a blender to combine almond butter and sawdust. While this may not be "novel", and it may be derived from existing materials and methods, you may still lay claim to having created something that didn't exist before.

For a more practical example, creating bindings from dynamic-language-A for a library in compiled-language-B is a genuinely useful task, allowing you to create things that didn't exist before. Those things are likely to unlock great happiness and/or productivity, even if they are derived from training data.

windexh8er 5 hours ago|||
> That is a pedantic distinction. You can create something that didn't exist by combining two things that did exist, in a way of combining things that already existed.

This is the definition of a derived product. Call it a derivative work if we're being pedantic and, regardless, is not any level of proof that LLMs "think".

threethirtytwo 3 hours ago|||
Pedantic and not true. The LLM has stochastic processes involved. Randomness. That’s not old information. That’s newly generated stuff.
jama211 11 hours ago||||
Yeah you’ve lost me here I’m sorry. In the real world humans work with AI tools to create new things. What you’re saying is the equivalent of “when a human writes a book in English, because they use words and letters that already exist and they already know they aren’t creating anything new”.
nl 9 hours ago||||
What does "think" mean?

Why is that kind of thinking required to create novel works?

Randomness can create novelty.

Mistakes can be novel.

There are many ways to create novelty.

Also I think you might not know how LLMs are trained to code. Pre-training gives them some idea of the syntax etc but that only gets you to fancy autocomplete.

Modern LLMs are heavily trained using reinforcement data which is custom task the labs pay people to do (or by distilling another LLM which has had the process performed on it).

windexh8er 5 hours ago||
> Also I think you might not know how LLMs are trained to code.

What's clear here is that you have zero idea what you're talking about while poorly mansplaining.

zingar 12 hours ago||||
Could you give us an idea of what you’re hoping for that is not possible to derive from training data of the entire internet and many (most?) published books?
techpression 11 hours ago||
This is the problem, the entire internet is a really bad set of training data because it’s extremely polluted.

Also the derived argument doesn’t really hold, just because you know about two things doesn’t mean you’d be able to come up with the third, it’s actually very hard most of the time and requires you to not do next token prediction.

threethirtytwo 9 hours ago||
The emergent phenomenon is that the LLM can separate truth from fiction when you give it a massive amount of data. It can figure the world out just as we can figure it out when we are as well inundated with bullshit data. The pathways exist in the LLM but it won’t necessarily reveal that to you unless you tune it with RL.
ahtihn 6 hours ago||
> The emergent phenomenon is that the LLM can separate truth from fiction when you give it a massive amount of data.

I don't believe they can. LLMs have no concept of truth.

What's likely is that the "truth" for many subjects is represented way more than fiction and when there is objective truth it's consistently represented in similar way. On the other hand there are many variations of "fiction" for the same subject.

threethirtytwo 3 hours ago||
They can and we have definitive proof. When we tune LLM models with reinforcement learning the models end up hallucinating less and becoming more reliable. Basically in a nut shell we reward the model when telling the truth and punish it when it’s not.

So think of it like this, to create the model we use terabytes of data. Then we do RL which is probably less than one percent of additional data involved in the initial training.

The change in the model is that reliability is increased and hallucinations are reduced at a far greater rate than one percent. So much so that modern models can be used for agentic tasks.

How can less than one percent of reinforcement training get the model to tell the truth greater than one percent of the time?

The answer is obvious. It ALREADY knew the truth. There’s no other logical way to explain this. The LLM in its original state just predicts text but it doesn’t care about truth or the kind of answer you want. With a little bit of reinforcement it suddenly does much better.

It’s not a perfect process and reinforcement learning often causes the model to be deceptive an not necessarily tell the truth but it more gives an answer that may seem like the truth or an answer that the trainer wants to hear. In general though we can measurably see a difference in truthfulness and reliability to an extent far greater than the data involved in training and that is logical proof it knows the difference.

Additionally while I say it knows the truth already this is likely more of a blurry line. Even humans don’t fully know the truth so my claim here is that an LLM knows the truth to a certain extent. It can be wildly off for certain things but in general it knows and this “knowing” has to be coaxed out of the model through RL.

Keep in mind the LLM is just auto trained on reams and reams of data. That training is massive. Reinforcement training is done on a human basis. A human must rate the answers so it is significantly less.

closewith 8 hours ago|||
By that definition, nearly all commercial software development (and nearly all human output in general) is derived output.
windexh8er 5 hours ago||
Wow.

You’re using ‘derived’ to imply ‘therefore equivalent.’ That’s a category error. A cookbook is derived from food culture. Does an LLM taste food? Can it think about how good that cookie tastes?

A flight simulator is derived from aerodynamics - yet it doesn’t fly.

Likewise, text that resembles reasoning isn’t the same thing as a system that has beliefs, intentions, or understanding. Humans do. LLMs don't.

Also... Ask an LLM what's the difference between a human brain and an LLM. If an LLM could "think" it wouldn't give you the answer it just did.

CamperBob2 3 hours ago|||
Ask an LLM what's the difference between a human brain and an LLM. If an LLM could "think" it wouldn't give you the answer it just did.

I imagine that sounded more profound when you wrote it than it did just now, when I read it. Can you be a little more specific, with regard to what features you would expect to differ between LLM and human responses to such a question?

Right now, LLM system prompts are strongly geared towards not claiming that they are humans or simulations of humans. If your point is that a hypothetical "thinking" LLM would claim to be a human, that could certainly be arranged with an appropriate system prompt. You wouldn't know whether you were talking to an LLM or a human -- just as you don't now -- but nothing would be proved either way. That's ultimately why the Turing test is a poor metric.

closewith 3 hours ago|||
You’re arguing against a straw man. No one is claiming LLMs have beliefs, intentions, or understanding. They don’t need them to be economically useful.
weatherlite 13 hours ago||||
> So it is absurdly incorrect to say "they can only reproduce the past."

Also , a shitton of what we do economically is reproducing the past with slight tweaks and improvements. We all do very repetitive things and these tools cut the time / personnel needed by a significant factor.

crystal_revenge 14 hours ago|||
I think the confusion is people's misunderstanding of what 'new code' and 'new imagery' mean. Yes, LLMs can generate a specific CRUD webapp that hasn't existed before but only based on interpolating between the history of existing CRUD webapps. I mean traditional Markov Chains can also produce 'new' text in the sense that "this exact text" hasn't been seen before, but nobody would argue that traditional Markov Chains aren't constrained by "only producing the past".

This is even more clear in the case of diffusion models (which I personally love using, and have spent a lot of time researching). All of the "new" images created by even the most advanced diffusion models are fundamentally remixing past information. This is really obvious to anyone who has played around with these extensively because they really can't produce truly novel concepts. New concepts can be added by things like fine-tuning or use of LoRAs, but fundamentally you're still just remixing the past.

LLMs are always doing some form of interpolation between different points in the past. Yes they can create a "new" SQL query, but it's just remixing from the SQL queries that have existed prior. This still makes them very useful because a lot of engineering work, including writing a custom text editor, involve remixing existing engineering work. If you could have stack-overflowed your way to an answer in the past, an LLM will be much superior. In fact, the phrase "CRUD" largely exists to point out that most webapps are fundamentally the same.

A great example of this limitation in practice is the work that Terry Tao is doing with LLMs. One of the largest challenges in automated theorem proving is translating human proofs into the language of a theorem prover (often Lean these days). The challenge is that there is not very much Lean code currently available to LLMs (especially with the necessary context of the accompanying NL proof), so they struggle to correctly translate. Most of the research in this area is around improving LLM's representation of the mapping from human proofs to Lean proofs (btw, I personally feel like LLMs do have a reasonably good chance of providing major improvements in the space of formal theorem proving, in conjunction with languages like Lean, because the translation process is the biggest blocker to progress).

When you say:

> So it is absurdly incorrect to say "they can only reproduce the past."

It's pretty clear you don't have a solid background in generative models, because this is fundamentally what they do: model an existing probability distribution and draw samples from that. LLMs are doing this for a massive amount of human text, which is why they do produce some impressive and useful results, but this is also a fundamental limitation.

But a world where we used LLMs for the majority of work, would be a world with no fundamental breakthroughs. If you've read The Three Body Problem, it's very much like living in the world where scientific progress is impeded by sophons. In that world there is still some progress (especially with abundant energy), but it remains fundamentally and deeply limited.

PeterHolzwarth 13 hours ago|||
Just an innocent bystander here, so forgive me, but I think the flack you are getting is because you appear to be responding to claims that these tools will reinvent everything and introduce a new halcyon age of creation - when, at least on hacker news, and definitely in this thread, no one is really making such claims.

Put another way, and I hate to throw in the now over-used phrase, but I feel you may be responding to a strawman that doesn't much appear in the article or the discussion here: "Because these tools don't achieve a god-like level of novel perfection that no one is really promising here, I dismiss all this sorta crap."

Especially when I think you are also admitting that the technology is a fairly useful tool on its own merits - a stance which I believe represents the bulk of the feelings that supporters of the tech here on HN are describing.

I apologize if you feel I am putting unrepresentative words in your mouth, but this is the reading I am taking away from your comments.

aoeusnth1 1 hour ago||||
> It's pretty clear you don't have a solid background in generative models, because this is fundamentally what they do: model an existing probability distribution and draw samples from that.

After post-training, this is definitively NOT what an LLM does.

signatoremo 12 hours ago||||
Lot of impressive points. They are also irrelevant. The majority of people also only extrapolate from the knowledge they acquired in the past. That’s why there is the concept of inventor, someone who comes up with new ideas. Many new inventions are also based on existing ideas. Is that the reason to dismiss those achievements?

Do you only take LLM seriously if it can be another Einstein?

> But a world where we used LLMs for the majority of work, would be a world with no fundamental breakthroughs.

What do you consider recent fundamental breakthroughs?

Even if you are right, human can continue to work on hard problems while letting LLM handle the majority of derivative work

oedemis 4 hours ago||||
as architectures evolve, i think it can be that we learn more "side effects".. back in 2020 openai researchers said "GPT-3 is applied without any gradient updates or fine-tuning" the model emerges at a certain level of scale...
throwaway7783 13 hours ago||||
Would you say that LLMs can discover patterns hitherto unknown? It would still be generating from the past, but patterns/connections not made before.
uxcolumbo 12 hours ago||||
How do human brains create something novel and what will it take for AIs to do the same?
threethirtytwo 8 hours ago||||
> It's pretty clear you don't have a solid background in generative models, because this is fundamentally what they do

You don’t have a solid background. No one does. We fundamentally don’t understand LLMs, this is an industry and academic opinion. Sure there are high level perspectives and analogies we can apply to LLMs and machine learning in general like probability distributions, curve fitting or interpolations… but those explanations are so high level that they can essentially be applied to humans as well. At a lower level we cannot describe what’s going on. We have no idea how to reconstruct the logic of how an LLM arrived at a specific output from a specific input.

It is impossible to have any sort of deterministic function, process or anything produce new information from old information. This limitation is fundamental to logic and math and thus it will limit human output as well.

You can combine information you can transform information you can lose information. But producing new information from old information from deterministic intelligence is fundamentally impossible in reality and therefore fundamentally impossible for LLMs and humans. But note the keyword: “deterministic”

New information can literally only arise through stochastic processes. That’s all you have in reality. We know it’s stochastic because determinism vs. stochasticism are literally your only two viable options. You have a bunch of inputs, the outputs derived from it are either purely deterministic transformations or if you want some new stuff from the input you must apply randomness. That’s it.

That’s essentially what creativity is. There is literally no other logical way to generate “new information”. Purely random is never really useful so “useful information” arrives only after it is filtered and we use past information to filter the stochastic output and “select” something that’s not wildly random. We also only use randomness to perturb the output a little bit so it’s not too crazy.

In the end it’s this selection process and stochastic process combined that forms creativity. We know this is a general aspect of how creativity works because there’s literally no other way to do it.

LLMs do have stochastic aspects to them so we know for a fact it is generating new things and not just drawing on the past. We know it can fit our definition of “creative” and we can literally see it be creative in front of your eyes.

You’re ignoring what you see with your eyes and drawing your conclusions from a model of LLMs that isn’t fully accurate. Or you’re not fully tying the mechanisms of how LLMs work with what creativity or generating new data from past data is in actuality.

The fundamental limitation with LLMs is not that it can’t create new things. It’s that the context window is too small to create new things beyond that. Whatever it can create it is limited to the possibilities within that window and that sets a limitation on creativity.

What you see happening with LEAN can also be an issue with the context window being too small. If we have an LLM with a giant context window bigger than anything before… and pass it all the necessary data to “learn” and be “trained” on lean it can likely start to produce new theorems without literally being “trained”.

Actually I wouldn’t call this a “fundamental” problem. More fundamental is the aspect of hallucinations. The fact that LLMs produce new information from past information in the WRONG way. Literally making up bullshit out of thin air. It’s the opposite problem of what you’re describing. These things are too creative and making up too much stuff.

We have hints that LLMs know the difference between hallucinations and reality but coaxing it to communicate that differentiation to us is limited.

Alconicon 4 hours ago|||
[dead]
threethirtytwo 9 hours ago||||
Over half of HN still thinks it’s a stochastic parrot and that it’s just a glorified google search.

The change hit us so fast a huge number of people don’t understand how capable it is yet.

Also it certainly doesn’t help that it still hallucinates. One mistake and it’s enough to set someone against LLMs. You really need to push through that hallucinations are just the weak part of the process to see the value.

CamperBob2 2 hours ago||
The problem I see, over and over, is that people pose poorly-formed questions to the free ChatGPT and Google models, laugh at the resulting half-baked answers that are often full of errors and hallucinations, and draw conclusions about the technology as a whole.

Either that, or they tried it "last year" or "a while back" and have no concept of how far things have gone in the meantime.

It's like they wandered into a machine shop, cut off a finger or two, and concluded that their grandpa's hammer and hacksaw were all anyone ever needed.

handoflixue 14 hours ago|||
Seriously, all that familiarity and you think an LLM "literally" can't invent anything that didn't already exist?

Like, I'm sorry, but you're just flat-out wrong and I've got the proof sitting on my hard drive. I use this supposedly impossible program daily.

windexh8er 12 hours ago|||
Do you also think LLMs "think"?

From what you've described an LLM has not invented anything. LLMs that can reason have a bit more slight of hand but they're not coming up with new ideas outside of the bounds of what a lot of words have encompassed in both fiction and non.

Good for you that you've got a fun token of code that's what you've always wanted, I guess. But this type of fantasy take on LLMs seems to be more and more prevalent as of late. A lot of people defending LLMs as if they're owed something because they've built something or maybe people are getting more and more attached to them from the conversational angle. I'm not sure, but I've run across more people in 2025 that are way too far in the deep end of personifying their relationships with LLMs.

Kerrick 12 hours ago|||
Hang on, you're now saying that if something has ever been described in fiction it doesn't count as invention? So if somebody literally developed a working photon torpedo, that isn't new because "Star Trek Did It"?
windexh8er 4 hours ago|||
You seem to be pretty far down the rabbit hole. How about this... You task an LLM to create a photon torpedo. If it can truly think then it should be able to provide you with something tangible. When you've got that in hand let us all know.

Back to the land of reality... Describing something in fiction doesn’t magically make it "not an invention". Fiction can anticipate an idea, but invention is about producing a working, testable implementation and usually involves novel technical methods. "Star Trek did it" is at most prior art for the concept, not a blueprint for the mechanism. If you can't understand that differential then maybe go ask an LLM.

Kerrick 4 hours ago||
I didn't say anything about an LLM. I said "somebody" not "some predictive text engine."
phatfish 10 hours ago|||
Is there any danger an LLM is going to create a working photo torpedo?
ben_w 10 hours ago||
Well, they can use tools, and tools includes physics simulations, so if it is possible (and FWIW the tool-free "intuition" of ChatGPT is "there will never be an age of antimatter"), then why couldn't LLMs grind those tools to get a solution?
Alconicon 4 hours ago|||
[dead]
ctxc 10 hours ago||||
Some people cannot be convinced simply because their expectation of "novel" is something that appears in an Asimov novel.

I for one think your work is pretty cool - even though I haven't seen it, using something you built everyday is a claim not many can make!

9rx 6 hours ago||||
When a computer is able to invent things, we’ve achieved AGI. Do you believe we are already in the AGI era, or is the inventor in this case actually you?
bigyabai 14 hours ago|||
FWIW, your "evidence" is a text editor. I'm glad you made a tool that works for you, but the parent's point stands; this is a 200-level course-curriculum homework assignment. Tens of thousands of homemade editors exist, in various states of disrepair and vain overengineering.
least 13 hours ago||
The difference between those is the person is actually using this text editor that they built with the help of LLMs. There's plenty of people creating novel scripts and programs that can accommodate their own unique specifications.

If a programmer creating their own software (or contracting it out to a developer) would be a bespoke suit and using software someone or some company created without your input is an off the rack suit, I'd liken these sorts of programs as semi-bespoke, or made to measure.

"LLMs are literally technology that can only reproduce the past" feels like an odd statement. I think the point they're going for is that it's not thinking and so it's not going to produce new ideas like a human would? But literally no technology does that. That is all derived from some human beings being particularly clever.

LLMs are tools. They can enable a human to create new things because they are interfacing with a human to facilitate it. It's merging the functional knowledge and vision of a person and translating it into something else.

resize2996 1 hour ago||
compilers can only produce machine code. so unorginal.
Greduan 12 hours ago||||
Text editors in a thousand flavours has indeed already been programmed though. I don't think you understood what op meant.

Curious, does it perform at the limit of the hardware? Was it programmed in a tools language (like C++, Rust, C, etc.) or in a web tech?

zingar 12 hours ago||
What is the point that you believe would be demonstrated by a new text editor running at the limit of hardware in a compiled editor? Would that point apply to every other text editor that exists already?
fmbb 11 hours ago|||
Is your new text editor open source?
waldrews 13 hours ago||||
I'm being hyperbolic of course, but I'm a little dismissive of the progress that happened since the days of BBS's and car based cell phones - we just got more connectivity, more capacity, more content, bigger/faster. Likewise, my attitude toward machine learning before 2023 is a smug 'heh, these computer scientists are doing undisciplined statistics at scale, how nice for them.' Then all of a sudden the machines woke up and started arguing with me, coherently, even about niche topics I have a PhD in. I can appreciate in retrospect how much of the machine learning progress ultimately went into that, but, like fusion, the magic payoff was supposed to be decades away and always remain decades away. This wasn't supposed to happen in my lifetime. 2025 progress isn't the 2023 shock, but this was the year LLM's-as-programmers (and LLM's-as-mathematicians, and...) went from 'isn't that cute, the machine is trying' to 'an expert with enough time would make better choices than the machine did,' and that makes for a different world. More so than, going from a Commodore Vic 20 with 4k of RAM and a modem to the latest Macbook.
ako 9 hours ago|||
> This year honestly feels quite stagnant. LLMs are literally technology that can only reproduce the past.

Is this such a big limitation? Most jobs are basically people trained on past knowledge applying it today. No need to generate new knowledge.

And a lot of new knowledge is just combining 2 things from the past in a new way.

throwup238 19 hours ago|||
> they voted to add some syntactic sugar to Java...

I remember when we just wanted to rewrite everything in Rust.

Those were the simpler times, when crypto bros seemed like the worst venture capitalism could conjure.

OGEnthusiast 18 hours ago||
Crypto bros in hindsight were so much less dangerous than AI bros. At least they weren't trying to construct data centers in rural America or prop up artificial stocks like $NVDA.
SauntSolaire 17 hours ago|||
Instead they were building crypto mining warehouses in rural America and propping up artificial currencies like BTC.
ryandrake 15 hours ago||
Crazy how the two most hyped and funded technologies of the decade were: energy wasting fake money for criminals and energy wasting plagiarism machines.
scotty79 12 hours ago||
[flagged]
zahlman 17 hours ago||||
Speaking of which, we never found out the details (strike price/expiration) of Michael Burry's puts, did we? It seems he could have made bank if he'd waited one more month...
kamranjon 16 hours ago||
I think they expire in March 2026 if the NVIDIA stock drops to $140 a share? Something close to that I think.
quaintpartridge 18 hours ago||||
They were, just not as many. https://www.wired.com/story/the-worlds-biggest-bitcoin-mine-...
mgfist 14 hours ago|||
It's funny how people complain about the rust belt dying and factories leaving rural communities and so on, then when someone wants to build something that can provide jobs and tax revenue, everyone complains.
jakeydus 14 hours ago|||
How many people are employed at the average data center? A few dozen? Versus a steel mill, that’s nothing. A chicken plant in Nebraska closed down this last month. 3200 people lost their jobs. You think Meta will fill it with GPUs and the whole town will have jobs again?
scotty79 12 hours ago||
Many more are employed while building it. And they will never stop building. It's modern version of rail. But instead of distances it will cover the area.
uxcolumbo 11 hours ago||
Will local folks get those jobs to build the data center?

And if so, what happens to those builders once the data center is built?

lostlogin 14 hours ago||||
I’ve heard about the risk of AI leading to job losses and wealth concentration.

I haven’t heard about new businesses, job creation and growth in former industrial towns. What have I missed?

techpression 11 hours ago|||
As if any taxes will be paid to the areas affected, and add to that the billions in taxes used to subsidize everything before a single cent is a net positive.
odiroot 9 hours ago||
I'm very relieved we've moved away from rewriting everything in Rust.
jll29 7 hours ago|||
There's no reason not to use Rust for LLM-generated code in the longer term (other than lack of Rust code to learn from in the shorter term).

The stricter typing of Rust would make sematic errors in generated code come out more quickly than in e.g. Python because using static typing the chances are that some of the semantic errors are also type violations.

michaelcampbell 7 hours ago|||
Have we though? I'm glad we're not shouting about it from the rooftops like it's some magical "win" button as much, but TBH the things I use routinely that HAVE been rewritten in rust are generally much better. That could also just be because they're newer and have the errors of the past to not repeat.
mrheosuper 11 hours ago||
I'm not against AI/LLM(in fact, i am quite supportive to it). But one of my biggest fear is overusing AI. We may introduce some tool that only "AI/LLM" can resonably do(Like tool with weird, convoluted UI/UX, syntax) and no one against it because AI/LLM can use/interact.

Then genAI, It's become more and more difficult to tell which is AI and which is not, and AI is in everywhere. I dont know what to think about it. "If you can't tell, does it matter ?"

netdur 29 minutes ago|
i think the concern about software shifting toward ai design ignores that the web hasn't been human-first for a long time. most traffic is already machine to machine, like crawlers and ci pipelines. we’ve tolerated systems that are barely legible for years. anyone who has grepped through android studio logs knows that human readability is usually a tertiary goal at best. ai interacting with complex systems is just an evolution of the glue code we’ve always written.

as for who made it, utility usually matters more than where it came from. i used an agent for an oss changelog recently and it picked up things i’d forgotten while structuring the narrative better than i could. the intent and code were mine, but the ai acted as a high fidelity compressor. the risk isn't ai being everywhere. it’s the atrophy of judgment where we stop using it to support decisions and start using it to outsource thinking.

didip 17 hours ago||
Indeed. I don't understand why Hacker News is so dismissive about the coming of LLMs, maybe HN readers are going through 5 stages of grief?

But LLM is certainly a game changer, I can see it delivering impact bigger than the internet itself. Both require a lot of investments.

crystal_revenge 16 hours ago||
> I don't understand why Hacker News is so dismissive about the coming of LLMs

I find LLMs incredibly useful, but if you were following along the last few years the promise was for “exponential progress” with a teaser world destroying super intelligence.

We objectively are not on that path. There is no “coming of LLMs”. We might get some incremental improvement, but we’re very clearly seeing sigmoid progress.

I can’t speak for everyone, but I’m tired of hyperbolic rants that are unquestionably not justified (the nice thing about exponential progress is you don’t need to argue about it)

viraptor 15 hours ago|||
> exponential progress

First you need to define what it means. What's the metric? Otherwise it's very much something you can argue about.

nicbou 6 hours ago|||
Time spent being human and enjoying life.

I can’t point at many problems it has meaningfully solved for me. I mean real problems , not tasks that I have to do for my employer. It seems like it just made parts of my existence more miserable, poisoned many of the things I love, and generally made the future feel a lot less certain.

noodletheworld 12 hours ago||||
> What's the metric?

Language model capability at generating text output.

The model progress this year has been a lot of:

- “We added multimodal”

- “We added a lot of non AI tooling” (ie agents)

- “We put more compute into inference” (ie thinking mode)

So yes, there is still rapid progress, but these ^ make it clear, at least to me, that next gen models are significantly harder to build.

Simultaneously we see a distinct narrowing between players (openai, deepseek, mistral, google, anthropic) in their offerings.

Thats usually a signal that the rate of progress is slowing.

Remind me what was so great about gpt 5? How about gpt4 from from gpt 3?

Do you even remember the releases? Yeah. I dont. I had to look it up.

Just another model with more or less the same capabilities.

“Mixed reception”

That is not what exponential progress looks like, by any measure.

The progress this year has been in the tooling around the models, smaller faster models with similar capabilities. Multimodal add ons that no one asked for, because its easier to add image and audio processing than improve text handling.

That may still be on a path to AGI, but it not an exponential path to it.

threethirtytwo 2 hours ago|||
I don’t think the path was ever exponential but your claim here is almost as if the slow down hit an asymptote like wall.

Most of the improvements are intangible. Can we truly say how much more reliable the models are? We barely have quantitative measurements on this so it’s all vibes and feels. We don’t even have a baseline metric for what AGI is and we invalidated the Turing test also based on vibes and feels.

So my argument is that part of the slow down is in itself an hallucination because the improvement is not actually measurable or definable outside of vibes.

dragonwriter 11 hours ago||||
> Language model capability at generating text output.

That's not a metric, that's a vague non-operationalized concept, that could be operationalized into an infinite number of different metrics. And an improvement that was linear in one of those possible metrics would be exponential in another one (well, actually, one that is was linear in one would also be linear in an infinite number of others, as well as being exponential in an infinite number of others.

That’s why you have to define an actual metric, not simply describe a vague concept of a kind of capacity of interest, before you can meaningfully discuss whether improvement is exponential. Because the answer is necessarily entirely dependent on the specific construction of the metric.

aoeusnth1 2 hours ago||||
> Language model capability at generating text output.

How would you put this on a graph?

viraptor 11 hours ago|||
> Language model capability at generating text output.

That's not a quantifiable sentence. Unless you put it in numbers, anyone can argue exponential/not.

> next gen models are significantly harder to build.

That's not how we judge capability progress though.

> Remind me what was so great about gpt 5? How about gpt4 from from gpt 3?

> Do you even remember the releases?

At gpt 3 level we could generate some reasonable code blocks / tiny features. (An example shown around at the time was "explain what this function does" for a "fib(n)") At gpt 4, we could build features and tiny apps. At gpt 5, you can often one-shot build whole apps from a vague description. The difference between them is massive for coding capabilities. Sorry, but if you can't remember that massive change... why are you making claims about the progress in capabilities?

> Multimodal add ons that no one asked for

Not only does multimodal input training improve the model overall, it's useful for (for example) feeding back screenshots during development.

scotty79 9 hours ago|||
Define it however you like. There's not a single chart you can draw that even begins to look like a signoid.
senordevnyc 55 minutes ago||||
I’ve been reading this comment multiple times a week for the last couple years. Constant assertions that we’re starting to hit limits, plateau, etc. But a cursory glance at where we are today vs a year ago, let alone two years ago, makes it wildly obvious that this is bullshit. The pace of improvement of both models and tooling has been breathtaking. I could give a shit whether you think it’s “exponential”, people like you were dismissing all of this years ago, meanwhile I just keep getting more and more productive.
fullstackchris 9 hours ago||||
I wrote an article complaining about the whole hype over a year ago:

https://chrisfrewin.medium.com/why-llms-will-never-be-agi-70...

Seems to be playing out that way.

scotty79 12 hours ago||||
> but we’re very clearly seeing sigmoid progress.

Yeah, probably. But no chart actually shows it yet. For now we are firmly in exponential zone of the signoid curve and can't really tell if it's going to end in a year, decade or a century.

utopiah 11 hours ago||
Doesn't even matter if the goal is extremely high. Talking about exponential when we clearly see matching energy needs proves there is no way we can maintain that pace without radical (and thus unpredictable) improvements.

My own "feeling" is that it's definitely not exponential but again, doesn't matter if it's unsustainable.

aoeusnth1 16 hours ago|||
We're very clearly seeing exponential progress - even above trend, on METR, whose slope keeps getting revised to a higher and higher estimate each time. Explain your perspective on the objective evidence against exponential progress?
llmslave2 15 hours ago||
Pretty neat how this exponential progress hasn't resulted in exponential productivity. Perhaps you could explain your perspective on that?
viraptor 15 hours ago|||
Writing the code itself was never the main bottleneck. Designing the bigger solution, figuring out tradeoffs, taking to affected teams, etc. takes as much time as it used to. But still, there's definitely a significant improvement in code production part in many areas.
mgfist 14 hours ago||||
Because that requires adoption. Devs on hackernews are already the most up to date folks in the industry and even here adoption of LLMs is incredibly slow. And a lot of the adoption that does happen is still with older tech like ChatGPT or Cursor.
belmont_sup 11 hours ago||
What’s the newer tech?
TeodorDyakov 9 hours ago||
Claude Code With Opus 4.5
HPMOR 15 hours ago||||
I think this is an open question still and very interesting. Ilya discussed this on the Dwarkesh podcast. But the capabilities of LLMs is clearly exponential and perhaps super exponential. We went from something that could string together incoherent text in 2022 to general models helping people like Terrance Tao and Scott Aaronson write new research papers. LLMs also beat IMO and the ICPC. We have entered the John Henry era for intellectual tasks...
tsimionescu 10 hours ago|||
> LLMs also beat IMO and the ICPC

Very spurious claims, given that there was no effort made to check whether the IMO or ICPC problems were in the training set or not, or to quantify how far problems in the training set were from the contest problems. IMO problems are supposed to be unique, but since it's not at the frontier of math research, there is no guarantee that the same problem, or something very similar, was not solved in some obscure manual.

llmslave2 14 hours ago|||
> But the capabilities of LLMs is clearly exponential and perhaps super exponential

By what metric?

utopiah 11 hours ago||
BS metric... /s
barrenko 8 hours ago||||
Sir, we're in a modern economy, we don't ever ever look at productivity graphs (this is not to disparage LLMs, just a comment on productivity in general)
scotty79 12 hours ago||||
How long before introduction of computers lead to increases in average productivity? How long for the internet? Business is just slow to figure out how to use anything for its benefit, but it eventually gets there.
spectralista 9 hours ago|||
The best example is that even ATM machines didn't reduce bank teller jobs.

Why? Because even the bank teller is doing more than taking and depositing money.

IMO there is an ontological bias that pervades our modern society that confuses the map for the territory and has a highly distorted view of human existence through the lens of engineering.

We don't see anything in this time series, because this time series itself is meaningless nonsense that reflects exactly this special kind of ontological stupidity:

https://fred.stlouisfed.org/series/PRS85006092

As if the sum of human interaction in an economy is some kind of machine that we just need to engineer better parts for and then sum the outputs.

Any non-careerist, thinking person that studies economics would conclude we don't and will probably not have the tools to properly study this subject in our lifetimes. The high dimensional interaction of biology, entropy and time. We have nothing. The career economist is essentially forced to sing for their supper in a type of time series theater. Then there is the method acting of pretending to be surprised when some meaningless reductionist aspect of human interaction isn't reflected in the fake time series.

fmbb 11 hours ago|||
> How long before introduction of computers lead to increases in average productivity?

I think it never did. Still has not.

https://en.wikipedia.org/wiki/Productivity_paradox

aoeusnth1 15 hours ago|||
It has! CLs/engineer increased by 10% this year.

LLMs from late 2024 were nearly worthless as coding agents, so given they have quadrupled in capability since then (exponential growth, btw), it's not surprising to see a modestly positive impact on SWE work.

Also, I'm noticing you're not explaining yourself :)

surajrmal 12 hours ago|||
I think this is happening by raising the floor for job roles which are largely boilerplate work. If you are on the more skilled side or work in more original/ niche areas, AI doesn't really help too much. I've only been able to use AI effectively for scaling refactors, not really much in feature development. It often just slows me down when I try to use it. I don't see this changing any time soon.
llmslave2 14 hours ago||||
Hey, I'm not the OG commentator, why do I have to explain myself! :)

When Fernando Alonso (best rookie btw) goes from 0-60 in 2.4 seconds in his Aston Martin, is it reasonable to assume he will near the speed of light in 20 seconds?

lopatin 13 hours ago|||
> Hey, I'm not the OG commentator, why do I have to explain myself! :)

The issue is that you're not acknowledging or replying to people's explanations for _why_ they see this as exponential growth. It's almost as if you skimmed through the meat of the comment and then just re-phrased your original idea.

> When Fernando Alonso (best rookie btw) goes from 0-60 in 2.4 seconds in his Aston Martin, is it reasonable to assume he will near the speed of light in 20 seconds?

This comparison doesn't make sense because we know the limits of cars but we don't yet know the limits of LLMs. It's an open question. Whether or not an F1 engine can make it the speed of light in 20 seconds is not an open question.

llmslave2 11 hours ago||
It's not in me to somehow disprove claims of exponential growth when there isn't even evidence provided of it.

My point with the F1 comparison is to say that a short period of rapid improvement doesn't imply exponential growth and it's about as weird to expect that as it is for an f1 car to reach the speed of light. It's possible you know, the regulations are changing for next season - if Leclerc sets a new lap record in Australia by .1 ms we can just assume exponential improvements and surely Ferrari will be lapping the rest of the field by the summer right?

aoeusnth1 1 hour ago||
There is already evidence provided of it! METR time horizons is going up on an exponential trend. This is literally the most famous AI benchmark and already mentioned in this thread.

https://metr.org/blog/2025-03-19-measuring-ai-ability-to-com...

https://metr.org/blog/2025-07-14-how-does-time-horizon-vary-...

aoeusnth1 1 hour ago|||
If you're not going to explain yourself, at least stay on topic. We're talking about exponential growth, so address the points I'm making.
Madmallard 13 hours ago|||
LLMs a year ago were more able to do a complex project I've repeatedly tried to do than they are now.
scotty79 12 hours ago||
Try Antigravity with Gemini 3 Pro. Seems very capable to me.
jcims 5 hours ago|||
It feels like there are several conversations happening that sound the same but are actually quite different.

One of them is whether or not large models are useful and/or becoming more useful over time. (To me, clearly the answer is yes)

The other is whether or not they live up to the hype. (To me, clearly the answer is no)

There are other skirmishes around capability for novelty, their role in the economy, their impact on human cognition, if/when AGI might happen and the overall impact to the largely tech-oriented community on HN.

tgv 9 hours ago|||
The negatives outweigh the positives, if only because the positives are so small. A bunch of coders making their lives easier doesn't really matter, but pupils and students skipping education does. As a meme said: you had better start eating healthy, because your future doctor vibed his way through med school.
viraptor 15 hours ago|||
Based on quite a few comments recently, it also looks like many have tried LLMs in the past, but haven't seriously revisited either the modern or more expensive models. And I get it. Not everyone wants to keep up to date every month, or burn cash on experiments. But at the same time, people seem to have opinions formed in 2024. (Especially if they talk about just hallucinations and broken code - tell the agent to search for docs and fix stuff) I'd really like to give them Opus 4.5 as an agent to refresh their views. There's lots to complain about, but the world has moved on significantly.
mirsadm 11 hours ago|||
This has been the argument since day one. You just have to try the latest model, that's where you went wrong. For the record I use Claude Code quite a bit and I can't see much meaningful improvements from the last few models. It is a useful tool but it's shortcomings are very obvious.
techpression 11 hours ago|||
Just last week Opus 4.5 decided that the way to fix a test was to change the code so that everything else but the test broke.

When people say ”fix stuff” I always wonder if it actually means fix, or just make it look like it works (which is extremely common in software, LLM or not).

viraptor 10 hours ago|||
Sure, I get an occasional bad result from Opus - then I revert and try again, or ask it for a fix. Even with a couple of restarts, it's going to be faster than me on average. (And that's ignoring the situations where I have to restart myself)

Basically, you're saying it's not perfect. I don't think anyone is claiming otherwise.

b3kart 10 hours ago|||
The problem is it’s imperfect in very unpredictable ways. Meaning you always need to keep it on a short leash for anything serious, which puts a limit on the productivity boost. And that’s fine, but does this match the level of investment and expectations?
techpression 8 hours ago|||
It’s not about being perfect, it’s about not being as great as the marketing, and many proponents, claim.

The issue is that there’s no common definition of ”fixed”. ”Make it run no matter what” is a more apt description in my experience, which works to a point but then becomes very painful.

simonw 11 hours ago||||
What did Opus do when you told it that it shouldn't have done that?
baq 9 hours ago|||
Nice. Did it realize the mistake and corrected it?
techpression 9 hours ago||
Nope, I did get a lot of fancy markdown with emojis though so I guess that was a nice tradeoff.

In general, even with access to the entire code base (which is very small), I find the inherent need in the models to satisfy the prompter to be their biggest flaw since it tends to constantly lead down this path. I often have to correct over convoluted SQL too because my problems are simple and the training data seems to favor extremely advanced operations.

zvolsky 16 hours ago|||
The idea of HN being dismissive of impactful technology is as old as HN. And indeed, the crowd often appears stuck in the past with hindsight. That said, HN discussions aren't homogeneous, and as demonstrated by Karpathy in his recent blogpost "Auto-grading decade-old Hacker News", at least some commenters have impressive foresight: https://karpathy.bearblog.dev/auto-grade-hn/
brabel 10 hours ago||
So exactly 10 years ago a lot of people believed that the game Go would not be “conquered” by AI, but after just a few months it was. People will always be skeptical of new things, even people who are in tech, because many hyped things indeed go nowhere… while it may look obvious in hindsight, it’s really hard to predict what will and what won’t be successful. On the LLM front I personally think it’s extremely foolish to still consider LLMs as going nowhere. There’s a lot more evidence today of the usefulness of LLMs than there was of DeepMind being able to beat top human players in Go 10 years ago.
hapticmonkey 12 hours ago|||
It’s not the technology I’m dismissive about. It’s the economics.

25 years ago I was optimistic about the internet, web sites, video streaming, online social systems. All of that. Look at what we have now. It was a fun ride until it all ended up “enshitified”. And it will happen to LLMs, too. Fool me once.

Some developer tools might survive in a useful state on subscriptions. But soon enough the whole A.I. economy will centralise into 2 or 3 major players extracting more and more revenue over time until everyone is sick of them. In fact, this process seems to be happening at a pretty high speed.

Once the users are captured, they’ll orient the ad-spend market around themselves. And then they’ll start taking advantage of the advertisers.

I really hope it doesn’t turn out this way. But it’s hard to be optimistic.

Al-Khwarizmi 9 hours ago||
Contrary to the case for the internet, there is a way out, however - if local, open-source LLMs get good. I really hope they do, because enshittification does seem unavoidable if we depend on commercial offerings.
ndiddy 2 hours ago||
Well the "solution" for that will be the GPU vendors focusing solely on B2B sales because it's more profitable, therefore keeping GPUs out of the hands of average consumers. There's leaks suggesting that nVidia will gradually hike the prices of their 5090 cards from $2000 to $5000 due to RAM price increases ( https://wccftech.com/geforce-rtx-5090-prices-to-soar-to-5000... ). At that point, why even bother with the R&D for newer consumer cards when you know that barely anyone will be able to afford them?
asielen 16 hours ago|||
It is an over correction because of all the empty promises of LLMs. I use Claude and chatgpt daily at work and am amazed at what they can do and how far they can come.

BUT when I hear my executive team talk and see demos of "Agentforce" and every saas company becoming an AI company promising the world, I have to roll my eyes.

The challenge I have with LLMs is they are great at creating first draft shiny objects and the LLMs themselves over promise. I am handed half baked work created by non technical people that now I have to clean up. And they don't realize how much work it is to take something from a 60% solution to a 100% solution because it was so easy for them to get to the 60%.

Amazing, game changing tools in the right hands but also give people false confidence.

Not that they are not also useful for non-technical people but I have had to spend a ton of time explaining to copywriters on the marketing team that they shouldn't paste their credentials into the chat even if it tells them to and their vibe coded app is a security nightmare.

semilin 12 hours ago||
This seems like the right take. The claims of the imminence of AGI are exhausting and to me appear dissonant with reality. I've tried gemini-cli and Claude Code and while they're both genuinely quite impressive, they absolutely suffer from a kind of prototype syndrome. While I could learn to use these tools effectively for large-scale projects, I still at present feel more comfortable writing such things by hand.

The NVIDIA CEO says people should stop learning to code. Now if LLMs will really end up as reliable as compilers, such that they can write code that's better and faster than I can 99% of the time, then he might be right. As things stand now, that reality seems far-fetched. To claim that they're useless because this reality has not yet been achieved would be silly, but not more silly than claiming programming is a dead art.

phatfish 9 hours ago|||
Maybe because the hype for an next gen search engine that can also just make things up when you query it is a bit much?
probably_wrong 16 hours ago|||
Speaking for myself: because if the hype were to be believed we should have no relational databases when there's MongoDB, no need for dollars when there's cryptocoins, all virtual goods would be exclusively sold as NFTs, and we would be all driving self-driving cars by now.

LLMs are being driven mostly by grifters trying to achieve a monopoly before they run out of cash. Under those conditions I find their promises hard to believe. I'll wait until they either go broke or stop losing money left and right, and whatever is left is probably actually useful.

simonw 16 hours ago||
The way I've been handling the deafening hype is to focus exclusively on what the models that we have right now can do.

You'll note I don't mention AGI or future model releases in my annual roundup at all. The closest I get to that is expressing doubt that the METR chart will continue at the same rate.

If you focus exclusively on what actually works the LLM space is a whole lot more interesting and less frustrating.

magicalhippo 9 hours ago|||
> focus exclusively on what the models that we have right now can do

I'm just a casual user, but I've been doing the same and have noticed the sharp improvements of the models we have now vs a year ago. I have OpenAI Business subscription through work, I signed up for Gemini at home after Gemini 3, and I run local models on my GPU.

I just ask them various questions where I know the answer well, or I can easily verify. Rewrite some code, factual stuff etc. I compare and contrast by asking the same question to different models.

AGI? Hell no. Very useful for some things? Hell yes.

nasnsjdkd 14 hours ago|||
[flagged]
cebert 17 hours ago|||
Many people feel threatened by the rapid advancements in LLMs, fearing that their skills may become obsolete, and in turn act irrationally. To navigate this change effectively, we must keep open minds, keep adaptable, and embrace continuous learning.
reppap 5 hours ago|||
I'm not threatened by LLMs taking my job as much as they are taking away my sanity. Every time I tell someone no and they come back to me with a "but copilot said.." it's followed by something entirely incorrect it makes me want to autodefenestrate.
callc 5 hours ago||
I am happy “autodefenestrate” is the first new word I learned in 2026. Thank you.

Autodefenestrate - To eject or hurl oneself from a window, especially lethally

rgoulter 9 hours ago||||
Many comments discussing LLMs involve emotions, sure. :) Including, obviously, comments in favour of LLMs.

But most discussion I see is vague and without specificity and without nuance.

Recognising the shortcomings of LLMs makes comments praising LLMs that much more believable; and recognising the benefits of LLMs makes comments criticising LLMs more believable.

I'd completely believe anyone who says they've found the LLM very helpful at greenfield frontend tasks, and I'd believe someone who found the LLM unable to carry out subtle refactors on an old codebase in a language that's not Python or JavaScript.

chii 16 hours ago||||
> in turn act irrationally

it isn't irrational to act in self-interest. If LLM threatens someone's livelihood, it matters not that it helps humanity overall one bit - they will oppose it. I don't blame them. But i also hope that they cannot succeed in opposing it.

Davidzheng 16 hours ago||
It's irrational to genuinely hold false beliefs about capabilities of LLMs. But at this point I assume around half of the skeptics are emotionally motivated anyway.
jdhsgsvsbzbd 15 hours ago||
As opposed to having skin in the game for llms and are blind to their flaws???

I'd assume that around half of the optimists are emotionally motivated this way.

nickphx 17 hours ago|||
rapid advancements in what? hallucinations..? FOMO marketing? certainly nothing productive.
vunderba 15 hours ago|||
> I don't understand why Hacker News is so dismissive about the coming of LLMs.

Eh. I wouldn’t be so quick to speak for the entirety of HN. Several articles related to LLMs easily hit the front page every single day, so clearly there are plenty of HN users upvoting them.

I think you're just reading too much into what is more likely classic HN cynicism and/or fatigue.

utopiah 11 hours ago|||
It's because both "side" tries to re-adjust.

When an "AI skeptic" sees a very positive AI comment, they try to argue that it is indeed interesting but nowhere near close to AI/AGI/ASI or whatever the hype at the moment uses.

When an "AI optimistic" sees a very negative AI comment, they try to list all the amazing things they have done that they were convinced was until then impossible.

ewoodrich 14 hours ago|||
Exactly. There was a stretch of 6 months or so right after ChatGPT was released where approximately 50% of front page posts at any given time were related to LLMs. And these days every other Show HN is some kind of agentic dev tool and Anthropic/OpenAI announcements routinely get 500+ comments in a matter of hours.
Atomic_Torrfisk 4 hours ago|||
> HN readers are going through 5 stages of grief

So we are just irrational and sour?

Night_Thastus 16 hours ago|||
LLMs hold some real utility. But that real utility is buried under a mountain of fake hype and over-promises to keep shareholder value high.

LLMs have real limitations that aren't going away any time soon - not until we move to a new technology fundamentally different and separate from them - sharing almost nothing in common. There's a lot of 'progress-washing' going on where people claim that these shortfalls will magically disappear if we throw enough data and compute at it when they clearly will not.

Gigachad 16 hours ago||
Pretty much. What actually exists is very impressive. But what was promised and marketed has not been delivered.
visarga 16 hours ago|||
I think the missing ingredient is not something the LLMs lack, but something we as developers don't do - we need to constrain, channel, and guide agents by creating reactive test environments around them. Not vibes, but hard tests, they are the missing ingredient to coding agents. You can even use AI to write most of these tests but the end result depends on how well you structured your code to be testable.

If you inherit 9000 tests from an existing project you can vibe code a replacement on your phone in a holiday, like Simon Willison's JustHTML port. We are moving from agents semi-randomly flailing around to constraint satisfaction.

coffeebeqn 15 hours ago||||
Yes and most of the investment has been kind of post-GPT4 betting that things will get exponentially more impressive
baq 9 hours ago||||
I find opus 4.5 and gpt 5.2 mind blowing more often than I find them dumb as rocks. I don’t listen to or read any marketing material, I just use the tools. I couldn’t care less about what the promises are, what I have now available to me is fundamentally different from what I had in August and it changed completely how I work.
rustystump 16 hours ago|||
Markets never deliver. That isnt new, i do think llms are not far off from google in terms of impact.

Search, as of today, is inferior to frontier models as a product. However, best case still misses expected returns by miles which is where the growsing comes from.

Generative art/ai is still up in the air for staying power but id predict it isnt going away.

claudiug 1 hour ago|||
because lies. all the people involved in this, the one a C title, tell us about how great is now.
snigsnog 17 hours ago|||
The internet and smartphones were immediately useful in a million different ways for almost every person. AI is not even close to that level. Very to somewhat useful in some fields (like programming) but the average person will easily be able to go through their day without using AI.

The most wide-appeal possibility is people loving 100%-AI-slop entertainment like that AI Instagram Reels product. Maybe I'm just too disconnected with normies but I don't see this taking off. Fun as a novelty like those Ring cam vids but I would never spend all day watching AI generated media.

nen-nomad 16 hours ago|||
ChatGPT has roughly 800 million weekly active users. Almost everyone around me uses it daily. I think you are underestimating the adoption.
arctic-true 55 minutes ago|||
Usage plunges on the weekends and during the summer, suggesting that a significant portion of users are students using ChatGPT for free or at heavily subsidized rates to do homework (i.e., extremely basic work that is extraordinarily well-represented in the training data). That usage will almost certainly never be monetizable, and it suggests nothing about the trajectory of the technology’s capability or popularity. I suspect ChatGPT, in particular, will see its usage slip considerably as the education system (hopefully) adapts.
simonw 27 minutes ago||
The summer slump was a thing in 2023 but apparently didn't repeat in 2024: https://www.similarweb.com/blog/insights/ai-news/chatgpt-bea...

The weekend slumps could equally suggest people are using it at work.

dragonwriter 1 hour ago||||
“Almost everyone will use it at free or effectively subsidized prices” and “It delivers utility which justifies its variable costs + fixed costs amortized over useful lifetime” are not the same thing, and its not clear how much of the use is tied to novelty such that if new and progressively more expensive to train releases at a regular cadence dropped off, usage, even at subsidized prices, would, too.
throw1235435 11 hours ago||||
How many pay? And out of that how many are willing to pay the amount to at least cover the inference costs (not loss leading?)

Outside the verifiable domains I think the impact is more assistance/augmentation than outright disruption (i.e. a novelty which is still nice). A little tiny bit of value sprinkled over a very large user base but each person deriving little value overall.

Even as they use it as search it is at best an incrementable improvement on what they used to do - not life changing.

mrweasel 2 hours ago||||
The adoption is just so weird to me. I cannot for the life of me get LLM chatbot to work for me. Every time I try I get into an argument with the stupid thing. They are still wrong constantly, and when I'm wrong they won't correct me.

I have great faith in AI in e.g. medical equipment, or otherwise as something built in, working on a single problem in the background, but the chat interface is terrible.

danielbln 10 hours ago|||
Even my mom and aunts are using it frequently for all sorts of things, and it took a long time for them to hop onto internet and smartphones at first.
raincole 16 hours ago||||
The early internet and smartphones (the Japanese ones, not iPhone) were definitely not "immediately" adopted by the mass, unlike LLM.

If "immediate" usefulness is the metric we measure, then the internet and smartphones are pretty insignificant inventions compared to LLM.

(of course it's not a meaningful metric, as there is no clear line between a dumb phone and a smart phone, or a moderately sized language model and a LLM)

JumpCrisscross 17 hours ago||||
> AI is not even close to that level

Kagi’s Research Assistant is pretty damn useful, particularly when I can have it poll different models. I remember when the first iPhone lacked copy-paste. This feels similar.

(And I don’t think we’re heading towards AGI.)

SgtBastard 17 hours ago||||
… the internet was not immediately useful in a million different ways for almost every person.

Even if you skip ARPAnet, you’re forgetting the Gopher days and even if you jump straight to WWW+email==the internet, you’re forgetting the mosaic days.

The applications that became useful to the masses emerged a decade+ after the public internet and even then, it took 2+ decades to reach anything approaching saturation.

Your dismissal is not likely to age well, for similar reasons.

chii 16 hours ago||
the "usefulness" excuse is irrelevant, and the claim that phones/internet is "immediately useful" is just a post hoc rationalization. It's basically trying to find a reasonable reason why opposition to AI is valid, and is not in self-interest.

The opposition to AI is from people who feel threatened by it, because it either threatens their livelihood (or family/friends'), and that they feel they are unable to benefit from AI in the same way as they had internet/mobile phones.

duchef 10 hours ago||
The usefulness of mobile phones was identifiable immediately and it is absolutely not 'post hoc rationalization'. The issue was the cost - once low cost mobile telephones were produced they almost immediately became ubiquitous (see nokia share price from the release of the nokia 6110 onwards for example).

This barrier does not exist for current AI technologies which are being given away free. Minor thought experiment - just how radical would the uptake of mobile phones have been if they were given away free?

jfyi 1 hour ago||
It's only low cost for general usage chat users. If you are using it for anything beyond that, you are paying or sitting in a long queue (likely both).

You may just be a little early to the renaissance. What happens when the models we have today run on a mobile device?

The nokia 6110 was released 15 years after the first commercial cell phone.

fragmede 16 hours ago||||
> The internet and smartphones were immediately useful in a million different ways for almost every person. AI is not even close to that level.

Those are some very rosy glasses you've got on there. The nascent Internet took forever to catch on. It was for weird nerds at universities and it'll never catch on, but here we are.

what-the-grump 16 hours ago||||
A year after the iPhone came out… it didn’t have an App Store, barely was able to play video, barely had enough power to last a day. You just don’t remember or were not around for it.

A year after llms came out… are you kidding me?

Two years?

10 years?

Today, by adding an MCP server to wrap the same API that’s been around forever for some system, makes the users of that system prefer NLI over the gui almost immediately.

staticassertion 17 hours ago|||
> Very to somewhat useful in some fields (like programming) but the average person will easily be able to go through their day without using AI.

I know a lot of "normal" people who have completely replaced their search engine with AI. It's increasingly a staple for people.

Smartphones were absolutely NOT immediately useful in a million different ways for almost every person, that's total revisionist history. I remember when the iPhone came out, it was AT&T only, it did almost nothing useful. Smartphones were a novelty for quite a while.

brabel 10 hours ago||
I agree with most points but as a tech enthusiast, I was using a smart phone years before the iPhone, and I could already use the internet, make video calls, email etc around 2005. It was a small flip phone but it was not uncommon for phones to do that already at that time, at least in Australia and parts of Asia (a Singaporean friend told me about the phone).
Madmallard 13 hours ago|||
Have you tried using it for anything actually complicated?

Lol. It's worse than nothing at all.

lukaslalinsky 13 hours ago||
I think the split between vibe coding and AI-assisted coding will only widen over time. If you ask LLMs to do something complex, they will fail and you waste your time. If you work with them as a peer, and you delegate tasks to them, they will succeed and you save your time.
watwut 11 hours ago||
I work with leers by delegating complex task to them while I do other complex tasks.
mmcnl 7 hours ago||
Let's hope 2026 will also have interesting innovations not related to AI or LLMs.
spicyusername 4 hours ago|
2025 had plenty of those, they just didn't get as many news headlines.

One of the difficult things of modernity is that it's easy to confuse what you hear about a lot with what is real.

One of the great things about modernity is that progress continues, whether we know about it or not.

timonoko 8 hours ago||
OpenSCAD-coding has improved significantly on all models. Now syntax is always right and they understand the concept of negative space.

Only problem is that they don't see connection between form and function. They may make teapot perfectly but don't understand that this form is supposed to contain liquid.

AndyNemmity 19 hours ago|
These are excellent every year, thank you for all the wonderful work you do.
tkgally 18 hours ago|
Same here. Simon is one of the main reasons I’ve been able to (sort of) keep up with developments in AI.

I look forward to learning from his blog posts and HN comments in the year ahead, too.

password4321 15 hours ago||
Don't forget you can pay Simon to keep up with less!

> At the end of every month I send out a much shorter newsletter to anyone who sponsors me for $10 or more on GitHub

https://simonwillison.net/about/#monthly

th0ma5 2 hours ago||
[flagged]
More comments...