Auto-grading decade-old Hacker News discussions with hindsight

Posted by __rito__ 12/10/2025

Auto-grading decade-old Hacker News discussions with hindsight(karpathy.bearblog.dev)

Related from yesterday: Show HN: Gemini Pro 3 imagines the HN front page 10 years from now - https://news.ycombinator.com/item?id=46205632

686 points | 270 commentspage 3

gen6acd60af 12/10/2025|

Commenters of HN:

Your past thoughts have been dredged up and judged.

For each $TOPIC, you have been awarded a grade by GPT-5.1 Thinking.

Your grade is based on OpenAI's aligned worldview and what OpenAI's blob of weights considers Truth in 2025.

Did you think well, netizen?

Are you an Alpha or a Delta-Minus?

Where will the dragnet grading of your online history happen next?

HighGoldstein 12/12/2025|

Of all the people on the entire internet, I would hope HN posters understand best that anything and everything posted online already has and also will at some point be used in such ways.

popinman322 12/11/2025||

It doesn't look like the code anonymizes usernames when sending the thread for grading. This likely induces bias in the grades based on past/current prevailing opinions of certain users. It would be interesting to see the whole thing done again but this time randomly re-assigning usernames, to assess bias, and also with procedurally generated pseudonyms, to see whether the bias can be removed that way.

I'd expect de-biasing would deflate grades for well known users.

It might also be interesting to use a search-grounded model that provides citations for its grading claims. Gemini models have access to this via their API, for example.

ProllyInfamous 12/11/2025||

What a human-like critizicism of human-like behavior.

I [as a human] also do the same thing when observing others in IRL and forum interactions. Reputation matters™

----

A further question is whether a bespoke username could influence the bias of a particular comment (e.g. A username of something like HatesPython might influence the interpretation of that commenter's particular perception of the Python coding language, which might actually be expressing positivity — the username's irony lost to the AI?).

khafra 12/11/2025||

You can't anonymize comments from well-known users, to an LLM: https://gwern.net/doc/statistics/stylometry/truesight/index

WithinReason 12/11/2025||

That's an overly strong claim, an LLM could also be used to normalise style

wetpaws 12/11/2025||

How would you possibly grade comments if you change them?

strken 12/11/2025|||

Extract the concrete predictions, evaluate them as true/false/indeterminate, and grade the user on the number of true vs false?

Natsu 12/11/2025||

This doesn't even seem to look at "predictions" if you dig into what it actually did. Looking at my own example (#210 on https://karpathy.ai/hncapsule/hall-of-fame.html with 4 comments), very little of what I said could be construed as "predictions" at all.

I got an A for commenting on DF saying that I had not personally seen save corruption and listing weird bugs. It's true that weird bugs have long been a defining feature of DF, but I didn't predict it would remain that way or say that save corruption would never be a big thing, just that I hadn't personally seen it.

Another A for a comment on Google wallet just pointing out that users are already bad at knowing what links to trust. Sure, that's still true (and probably will remain true until something fundamental changes), but it was at best half a prediction as it wasn't forward looking.

Then something on hospital airships from the 1930s. I pointed out that one could escape pollution, I never said I thought it would be a big thing. Airships haven't really ever been much of a thing, except in fiction. Maybe that could change someday, but I kinda doubt it.

Then lastly there was the design patent famously referred to as the "rounded corner" patent. It dings me for simplifying it to that label, despite my actual statements being that yes, there's more, but just minor details like that can be sufficient for infringement. But the LLM says I'm right about ties to the Samsung case and still oversimplifying it. Either way, none of this was really a prediction to begin with.

koakuma-chan 12/11/2025|||

You don’t need comments, just facts in them to see if they’re accurate.

karmickoala 12/10/2025||

I understand the exercise, but I think it should have a disclaimer, some of the LLM reviews are showing a bias and when I read the comments they turned out not to be as bad as the LLM made them. As this hits the front page, some people will only read the title and not the accompanying blog post, losing all of the nuance.

That said, I understand the concept and love what you did here. By this being exposed to the best disinfectant, I hope it will raise awareness and show how people and corporations should be careful about its usage. Now this tech is accessible to anyone, not only big techs, in a couple of hours.

It also shows how we should take with a grain of salt the result of any analysis of such scale by a LLM. Our private channels now and messages on software like Teams and Slack can be analyzed to hell by our AI overlords. I'm probably going to remove a lot of things from cloud drives just in case. Perhaps online discourse will deteriorate to more inane / LinkedIn style content.

Also, I like that your prompt itself has some purposefully leaked bias, which shows other risks—¹for instance, "fsflover: F", which may align the LLM to grade worse the handles that are related to free software and open source).

As a meta concept of this, I wonder how I'll be graded by our AI overlords in the future now that I have posted something dismissive of it.

¹Alt+0151

ComputerGuru 12/10/2025||

Looking at the results and the prompt, I would tweak the prompt to

* ignore comments that do not speculate on something that was unknown or had not achieved consensus as of the date of yyyy-mm-dd

* at the same time, exclude speculations for which there still isn’t a definitive answer or consensus today

* ignore comments that speculate on minor details or are stating a preference/opinion on a subjective matter

* it is ok to generate an empty list of users for a thread if there are no comments meeting the speculation requirements laid out above

* etc

losvedir 12/11/2025||

Agreed. I feel like it's more just a collection of good comments. It doesn't surprise me to see tptacek, patio11, etc there. I think the "prediction" aspect is under weighted.

But it reminds me that I miss Manishearth's comments! What ever happened to him? I recall him being a big rust contributor. I'd think he'd be all over the place, with rust's adoption since then. I also liked tokenadult. interesting blast from the past.

xpe 12/11/2025|||

Good points. To summarize: for a given comment, one presumably must downselect to the ones that can reasonably be interpreted as forecasts. I see some indicators that the creator of the project (despite his amazing reputation) skated over this part.

janalsncm 12/10/2025||

You would also need to exclude “predictions” for things which already happened at the time they were predicted.

alister 12/11/2025||

> https://karpathy.ai/hncapsule/2015-12-24/index.html#article-...

I wonder why ChatGPT refused to analyze it?

The HN article was "Brazil declares emergency after 2,400 babies are born with brain damage" but the page says "No analysis available".

bspammer 12/11/2025|

My guess is that it’s because there’s a lot of very negative comments about Brazil in that article. Trying to grade people for their opinions on a topic like that gets into dangerous territory.

intheitmines 12/11/2025||

Interesting that for the "December 16 2015 geohot is building Comma" it graded geohot's comments on the thread as only B

snowwrestler 12/11/2025|

Presumably because of how things went with Comma since then.

lapcat 12/10/2025|

Does anyone else think that HN engages in far too much navel-gazing? Nothing gets upvotes faster than a HN submission about HN.

dang 12/10/2025||

It's true that meta is the crack of internet forums, so we, er, crack down on it quite a bit. That's a longstanding view: https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu...

Alternate metaphor: evil catnip - https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...

But yesterday's thread and this one are clearly exceptions—far above the median. https://news.ycombinator.com/item?id=46212180 was particularly incredible I think!

latexr 12/10/2025|||

I love it when you share some insight about HN or internet communication for which you have relevant searches at the ready to explanations of the concept.

A personal favourite is “the contrarian dynamic”.

Do you have a list of those at the ready or do you just remember them? If you feel like sharing, what’s your process and is there a list of those you’d make public?

I imagine having one would be useful, e.g. for onboarding someone like tomhow, though that doesn’t really happen often.

dang 12/10/2025||

I just remember them. Or forget them!

The process is simply that moderation is super repetitive, so eventually certain pathways get engraved in one's memory. A lot of the time, though, I can't quite remember one of these patterns and I'm unable to dig up my past comments about it. That's annoying, in that particular way when your brain can feel something's there but is unable to retrieve it.

Terretta 12/11/2025||

Well, you're #24 in this article's hall of fame, and the LLM thinks your moderation views stood the test of time. Perhaps it can already retrieve them for you.

dang 12/11/2025||

There are so many interesting points and patterns that I've just lost track of over the years.

https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...

DonHopkins 12/11/2025|||

Dang, posting links to searches for your own comments is so meta, no matter the topic, but even more meta when about meta crack. I love how the first hit of meta crack is this, your own message about meta crack.

dang 12/11/2025||

I'm higher than my supplier!

DonHopkins 12/12/2025||

You never meta crack you wouldn't hit.

yellow_lead 12/10/2025|||

It's weird that HN viewers are interested in HN

CamperBob2 12/10/2025|||

As moultano suggests, this is likely because most other websites make it completely impossible to navel-gaze. We can't possibly give the HN admins too much praise and credit for their commitment to open and stable availability of legacy data.

More comments...