LLM Year in Review - Hacker News

Posted by swyx 12/19/2025

LLM Year in Review(karpathy.bearblog.dev)

384 points | 146 commentspage 2

lysecret 12/20/2025|

It’s funny how every podcaster/public ai figure is so certain text as a Ui will go away and it’s not going anywhere.

devalexwells 12/20/2025||

A few days ago I was trying to unsubscribe to a service (notably an AI 3D modeling tool that I was curious about).

I spent 5 minutes trying to find a way to unsubscribe and couldn't. Finally, I found it buried in the plan page as one of those low-contrast ellipses on the plan card.

Instead of unsubscribing me or taking me to a form, it opened a convos with an AI chatbot with a preconfigured "unsubscribe" prompt. I have never felt more angry with a UI that I had to waste more time talking to a robot before it would render the unsubscribe button in the chat.

Why would we bring the most hated feature of automated phone calls to apps? As a frontend engineer I am horrified by these trends.

tim333 12/20/2025||

It's probably increased during my lifetime. People used to talk, now they sit and text into smartphones.

gessha 12/20/2025||

There might be some confusion about the transition to what some call post-literate era: era where text is not the primary medium. That’s not necessarily bad because you get the advantages of other mediums - oral and visual but it is something to keep in mind.

tim333 12/20/2025||

I'm bit skeptical that a post-literate era is happening. I gather it appears in some sci-fi but I don't see much sign in reality. I mean here we are on a text only site. If anything we seem to be heading for a 100% literate society. Literacy graphs here: https://ourworldindata.org/grapher/cross-country-literacy-ra...

gessha 12/20/2025||

I don’t think the post-illiterate era means that text will disappear. I think it’s just not going to be dominant anymore but I also have my reservations since I do prefer the text medium.

andai 12/20/2025||

The bit about o3 being the turning point is very interesting. I heard someone say that o3 (or perhaps the cheaper o4-mini) should have been called gpt-5, and that people would have been mind blown. Instead it kind of went under the radar as far as the mainstream goes.

Whereas we just got the incremental progress with gpt-5 instead and it was very underwhelming. (Plus like 5 other issues at launch, but that's a separate story ;)

I'm not sure if o4-mini would have made a good default gpt though. (Most use is conversational and its language is very awkward.) So they could have just called it gpt-5 pro or something, and put it on the $20 tier. I don't know.

karpathy 12/20/2025|

I agree with this fwiw, for many months I talked to people who never used o3 and didn’t know what it was because it sounded weird. Maybe it wasn’t obvious at the time but that was a good major point release to make then.

cheesecompiler 12/20/2025||

Excellent more grounded review. A few questions:

> LLMs are emerging as a new kind of intelligence, simultaneously a lot smarter than I expected and a lot dumber than I expected

Isn't this concerning? How can we know which one we get? In the realm of code it's easier to tell when mistakes are being made.

> regular people benefit a lot more from LLMs compared to professionals, corporations and governments

We thought this would happen with things like AppleScript, VB, visual programming. But instead, AI is currently used as a smarter search engine. The issue is that's also the area where it hallucinates the most. What do you think is the solution?

bgwalter 12/19/2025||

Vibe coding is sufficient for job hoppers who never finish anything and leave when the last 20% have to be figured out. Much easier to promote oneself as an expert and leave the hard parts to other people.

zingar 12/20/2025||

I’ve found incredible productivity gains writing (vibe coding) tools for myself that will never need to be “productionised” or even used by another person. Heck even I will probably never use the latest log retrieval tool, which exists purely for Claude code to invoke it. There is a ton of useful software yet to be written for which there _is_ no “last 20%”.

diamond559 12/20/2025||

These tools are so useful and make you so much more "productive" that you don't think anyone else would want to pay anything for them huh? Did your boss at least give you a big raise for your "productivity" increase, or maybe lay off some of your underperforming coworkers bc you are just so much better now?

simonw 12/20/2025|||

Do you mean vibe coding as-in producing unreviewed code with LLMs and prompting at it until it appears to work, or vibe coding as a catch-all for any time someone uses AI-assistance to help them write code?

bgwalter 12/20/2025||

Karpathy uses the term for all of this in the exuberant paragraph 5. of his blog post.

augment_me 12/20/2025||

All software is not meant to be open-source, in production and working on 100 platforms.

Sometimes the point of the software is to make an app with 2 buttons for your mom to help her do her grocery shopping easier

andai 12/20/2025||

Here's the source for the jagged spiky intelligence diagram:

https://x.com/colin_fraser/status/1994235521812328695

https://karpathy.bearblog.dev/the-space-of-minds/

sireat 12/20/2025||

What is current state of the art workflow when working with legacy code across multiple languages?

This would be a 100 kLOC legacy project written in C++, Python, and jQuery era Javascript circa 2010. Original devs have long left. I would rather avoid C++ as much as possible.

I've been Github Copilot (in VS Code) user since June of 2021 and still use it heavily, but the "more powerful intellisence" approach is limiting me on legacy projects.

Presumably I need to provide more context on larger projects.

I can get pretty far with just ChatGPT plus and feeding bits and pieces of project. However that seems like using the wrong tool.

Codex seems better for building things but not sure about grokking existing things.

Would Cursor be more suitable for just dumping the whole project (all languages) basically 4 different sub projects and then selectively activating what to include in queries?

sandos 12/20/2025|

I dont understand, the agent mode of copilot will search for and be pretty good and filling its own context afaik. I never really feed any of our 100k+ lines legacy codebase explicitly to the LLM.

alexgotoi 12/20/2025||

LLMs still need to bring clear added value to enterprise and corporate work; otherwise, they remain a geek’s toy.

Big media agencies that claim to use AI rely on strong creative teams who fine-tune prompts and spend weeks doing so. Even then, they don’t fully trust AI to slice long videos into shorter clips for social media.

Heavy administrative functions like HR or Finance still don’t get approval to expose any of their data to LLMs.

What I’m trying to say is that we are still in the early stages of LLM development, and as promising as this looks, it’s still far from delivering the real value that is often claimed.

gessha 12/20/2025|

I think their non-deterministic nature is what’s making it difficult to adopt. It’s hard to train somebody in the old way of “if you see this, do this” because when you call the LLM twice you most likely get different results.

It took a long time to computerize businesses and it might take some time to adopt/adapt to LLMs.

rldjbpin 12/22/2025||

personally the point about apps built on top of LLMs resonated the most. however the success of well-engineered tools for specific use cases, often with choice of model, goes to show that:

- benchmarks don't mean a lot for the frontier stuff, but can be interesting for the same series of models (smaller v/s larger). reminds me of comparing clock speeds between CPUs.

- the app layer can fill the gaps to squeeze out the most for a use case, but there is still no one-size-fits-all situation.

- often the discourse here or the perspective of people building seem disconnected from an average user. a lot of discussion in the post is irrelevant for the vast majority of users. e.g. as cool as TUI can be, it is not an interface most users would gravitate towards.

while not directly related, other modalities are more exciting, and comes thanks to applying techniques for handling text to other media forms, or in conjunction.

nkko 12/20/2025|

Beyond graduating students, I see model labs as “accelerators/incubators” bundling, launching, and productizing observed ideas that gain traction. The sheer strength of their platforms, the number of eyes watching them, near-zero marginal costs, and seemingly unlimited budgets mean that only slow decision-making can prevent them from becoming the next Amazons of everything.

More comments...