Top
Best
New

Posted by __rito__ 12/10/2025

Auto-grading decade-old Hacker News discussions with hindsight(karpathy.bearblog.dev)
Related from yesterday: Show HN: Gemini Pro 3 imagines the HN front page 10 years from now - https://news.ycombinator.com/item?id=46205632
686 points | 270 commentspage 5
neilv 12/10/2025|
> I spent a few hours browsing around and found it to be very interesting.

This seems to be the result of the exercise? No evaluation?

My concern is that, even if the exercise is only an amusing curiosity, many people will take the results more seriously than they should, and be inspired to apply the same methods to products and initiatives that adversely affect people's lives in real ways.

cootsnuck 12/10/2025|
> My concern is that, even if the exercise is only an amusing curiosity, many people will take the results more seriously than they should, and be inspired to apply the same methods to products and initiatives that adversely affect people's lives in real ways.

That will most definitely happen. We already have known for awhile that algorithmic methods have been applied "to products and initiatives that adversely affect people's lives in real ways", for awhile: https://www.scientificamerican.com/blog/roots-of-unity/revie...

I guess the question is if LLMs for some reason will reinvigorate public sentiment / pressure for governing bodies to sincerely take up the ongoing responsibility of trying to lessen the unique harms that can be amplified by reckless implementation of algorithms.

SequoiaHope 12/10/2025||
This is great! Now I want to run this to analyze my own comments and see how I score and whether my rhetoric has improved in quality/accuracy over time!
godelski 12/10/2025||

  > I was reminded again of my tweets that said "Be good, future LLMs are watching". You can take that in many directions, but here I want to focus on the idea that future LLMs are watching. Everything we do today might be scrutinized in great detail in the future because doing so will be "free". A lot of the ways people behave currently I think make an implicit "security by obscurity" assumption. But if intelligence really does become too cheap to meter, it will become possible to do a perfect reconstruction and synthesis of everything. LLMs are watching (or humans using them might be). Best to be good.
Can we take a second and talk about how dystopian this is? Such an outcome is not inevitable, it relies on us making it. The future is not deterministic, the future is determined by us. Moreso, Karpathy has significantly more influence on that future than your average HN user.

We are doing something very *very* wrong if we are operating under the belief that this future is unavoidable. That future is simply unacceptable.

jacquesm 12/10/2025||
Given the quality of the judgment I'm not worried, there is no value here.

To properly execute this idea rather than to just toss it off without putting in the work to make it valuable is exactly what irritates me about a lot of AI work. You can be 900 times as productive at producing mental popcorn, but if there was value to be had here we're not getting it, just a whiff of it. Sure, fun project. But I don't feel particularly judged here. The funniest bit is the judgment on things that clearly could not yet have come to pass (for instance because there is an exact date mentioned that we have not yet reached). QA could be better.

godelski 12/10/2025||
I think you're missing the actual problem.

I'm not worried about this project but instead harvesting, analyzing all that data and deanonymizing people.

That's exactly what Karparthy is saying. He's not being shy about it. He said "behave because the future panopticon can look into the past". Which makes the panopticon effectively exist now.

  Be good, future LLMs are watching
  ...
  or humans using them might be
That's the problem. Not the accuracy of this toy project, but the idea of monitoring everyone and their entire history.

The idea that we have to behave as if we're being actively watched by the government is literally the setting of 1984 lol. The idea that we have to behave that way now because a future government will use the Panopticon to look into the past is absolutely unhinged. You don't even know what the rules of that world will be!

Did we forget how unhinged the NSA's "harvest now, decrypt later" strategy is? Did we forget those giant data centers that were all the news talked about for a few weeks?

That's not the future I want to create, is it the one you want?

To act as if that future is unavoidable is a failure of *us*

jacquesm 12/11/2025||
Yes, you are right, this is a real problem. But it really is just a variation on 'the internet never forgets', for instance in relation to teen behavior online. But AI allows for weaponization of such information. I wish the wannabe politicians of 2050 much good luck with their careers, they are going to be the most boring people available.
godelski 12/11/2025||
The internet never forgets but you could be anonymous. Or at least somewhat. But that's getting harder and harder

If such a thing isn't already possible (it is to a certain extent), we are headed towards a point where your words alone will be enough to fingerprint you.

jacquesm 12/11/2025||
Stylometry killed that a long time ago. There was a website, stylometry.net that coupled HN accounts based on text comparison and ranked the 10 best candidates. It was incredibly accurate and allowed id'ing a bunch of people that had gotten banned but that came back again. Based on that I would expect that anybody that has written more than a few KB of text to be id'able in the future.
godelski 12/11/2025||
You need a person's text with their actual identity to pull that off. Normally that's pretty hard, especially since you'll get different formats. Like I don't write the same way on Twitter as HN. But yeah, this stuff has been advancing and I don't think it is okay.
jacquesm 12/11/2025||
The AOL scandal pretty much proved that anonymity is a mirage. You may think you are anonymous but it just takes combining a few unrelated databases to de-anonymize you. HN users think they are anonymous but they're not, they drop factoids all over the place about who they are. 33 bits... it is one of my recurring favorite themes and anybody in the business of managing other people's data should be well aware of the risks.
godelski 12/12/2025||
I think you're being too conspiracy theorist here by making everything black and white.

Besides, the main problem of how difficult it is to deanonymize, not if possible.

Privacy and security both have to perfect defense. For example, there's no passwords that are unhackable. There are only passwords that cannot be hacked with our current technology, budgets, and lifetime. But you could brute force my HN password, it would just take billions of years.

The same distinction it's important here. My threat model on HN doesn't care if you need to spend millions of dollars nor thousands of hours to deanonymize me. My handle is here to discourage that and to allow me to speak more freely about certain topics. I'm not trying to hide from nation states, I'm trying to hide from my peers in AI and tech. So I can freely discuss my opinions, which includes criticizing my own community (something I think everyone should do! Be critical of the communities we associate with). And moreso I want people to consider my points on their merit alone, not on my identity nor status.

If I was trying to hide from nation states I'd do things very very differently, such as not posting on HN.

I'm not afraid of my handle being deanonymized, but I still think we should recognize the dangers of the future we are creating.

By oversimplifying you've created the position that this is a lost cause, as if we already lost and that because we lost we can't change. There are multiple fallacies here. The future has yet to be written.

If you really believe it is deterministic then what is the point to anything? To have desires it opinions? Are were just waiting to see which algorithm wins out? Or are we the algorithms playing themselves out? If it's deterministic wouldn't you be happy if the freedom algorithm won and this moment is an inflection in your programming? I guess that's impossible to say in an objective manner but I'd hope that's how it plays out

jacquesm 12/12/2025||
I have enough industry insights to prove that your data is floating out there, unprotected, in plain text and that those that are not bound by the law are making very good use of it. Every breach leaks more bits about you.

This is the main driver behind the targeted scams that ordinary people now have to deal with. It is why people get voice calls from loved ones in distress, why they get 'tech support' calls that aim to take over their devices and why lots of people have lost lots of money.

If you think I am too conspiracy theorist by making everything black and white that is maybe simply because we live different lives and have different experience.

acyou 12/11/2025||
I call this the "judgement day" scenario. I would be interested if there is some science fiction based on this premise.

If you believe in God of a certain kind, you don't think that being judged for your sins is unacceptable or even good or bad in itself, you consider it inevitable. We have already talked it over for 2000 years, people like the idea.

godelski 12/11/2025||
You'll be interested in Clarke's "The Light of Other Days". Basically a wormhole where people can look back at any point in time, ending all notion of privacy.

God is different though. People like God because they believe God is fair and infallible. That is not true for machines nor men. Similarly I do not think people will like this idea. I'm sure there will be some but look at people today and their religious fever. Or look in the past. They'll want it, but it is fleeting. Cults don't last forever, even when they're governments. Sounds like a great way to start wars. Every one will be easily justified

https://en.wikipedia.org/wiki/The_Light_of_Other_Days

nomel 12/11/2025||
> I realized that this task is actually a really good fit for LLMs

I've found the opposite, since these models still fail pretty wildly at nuance. I think it's a conceptual "needle in the haystack sort of problem.

A good test is to find some thread where there's a disagreement and have it try to analyze the discussion. It will usually strongly misrepresent what was being said, by each side, and strongly align with one user, missing the actual divide that's causing the disagreement (a needle).

gowld 12/11/2025|
As always, which model versions did you use in your test?
nomel 12/11/2025||
Claude Opus 4.5, Gemini 3 Pro, ChatGPT 5.1. Haven't tried ChatGPT 5.2.

It requires that the discussion has nuance, to see the failure. Gemini is, by far the, worst at this (which fits my suspicion that they heavily weighted reddit posts).

I don't think this is all that strange though. The human, on one side of the argument, is also missing the nuance, which is the source of the conflict. Is there a belief that AI has surpassed the average human, with conversational nuance!?

anshulbhide 12/11/2025||
I often summarise HN comments (which are sometimes more insightful than the original article) using an LLM. Total game-changer.
NooneAtAll3 12/11/2025||
UX feedback: I wish clicking on a new thread scrolled right side to the top again

reading from the end isn't really useful, y'know :)

dw_arthur 12/10/2025||
Reading this I feel the same sense of dread I get watching those highly choreographed Chinese holiday drone shows.
JetSetWilly 12/12/2025||
It would be great to run this on a collection of interesting threads over different periods and not just one snapshot. For example, the thread from the day Trump got elected in 2016, the thread from the day of brexit and so on. Those are the times when people make many passionate predictions about how the future will play out, be good to see them retroactively scored.
rkuykendall-com 12/12/2025|
Assuming this keeps running, I suppose we just have to wait about a year.
jeffbee 12/10/2025|
I'm delighted to see that one of the users who makes the same negative comments on every Google-related post gets a "D" for saying Waymo was smoke and mirrors. Never change, I guess.
More comments...