Top
Best
New

Posted by __rito__ 1 day ago

Auto-grading decade-old Hacker News discussions with hindsight(karpathy.bearblog.dev)
Related from yesterday: Show HN: Gemini Pro 3 imagines the HN front page 10 years from now - https://news.ycombinator.com/item?id=46205632
610 points | 255 commentspage 3
alister 1 day ago|
> https://karpathy.ai/hncapsule/2015-12-24/index.html#article-...

I wonder why ChatGPT refused to analyze it?

The HN article was "Brazil declares emergency after 2,400 babies are born with brain damage" but the page says "No analysis available".

bspammer 1 day ago|
My guess is that it’s because there’s a lot of very negative comments about Brazil in that article. Trying to grade people for their opinions on a topic like that gets into dangerous territory.
jeffnappi 1 day ago||
The analysis of the 2015 article about Triplebyte is fascinating [1]. Particularly the Awards section.

1. https://karpathy.ai/hncapsule/2015-12-08/index.html#article-...

karmickoala 1 day ago||
I understand the exercise, but I think it should have a disclaimer, some of the LLM reviews are showing a bias and when I read the comments they turned out not to be as bad as the LLM made them. As this hits the front page, some people will only read the title and not the accompanying blog post, losing all of the nuance.

That said, I understand the concept and love what you did here. By this being exposed to the best disinfectant, I hope it will raise awareness and show how people and corporations should be careful about its usage. Now this tech is accessible to anyone, not only big techs, in a couple of hours.

It also shows how we should take with a grain of salt the result of any analysis of such scale by a LLM. Our private channels now and messages on software like Teams and Slack can be analyzed to hell by our AI overlords. I'm probably going to remove a lot of things from cloud drives just in case. Perhaps online discourse will deteriorate to more inane / LinkedIn style content.

Also, I like that your prompt itself has some purposefully leaked bias, which shows other risks—¹for instance, "fsflover: F", which may align the LLM to grade worse the handles that are related to free software and open source).

As a meta concept of this, I wonder how I'll be graded by our AI overlords in the future now that I have posted something dismissive of it.

¹Alt+0151

intheitmines 1 day ago||
Interesting that for the "December 16 2015 geohot is building Comma" it graded geohot's comments on the thread as only B
snowwrestler 1 day ago|
Presumably because of how things went with Comma since then.
ComputerGuru 1 day ago||
Looking at the results and the prompt, I would tweak the prompt to

* ignore comments that do not speculate on something that was unknown or had not achieved consensus as of the date of yyyy-mm-dd

* at the same time, exclude speculations for which there still isn’t a definitive answer or consensus today

* ignore comments that speculate on minor details or are stating a preference/opinion on a subjective matter

* it is ok to generate an empty list of users for a thread if there are no comments meeting the speculation requirements laid out above

* etc

losvedir 1 day ago||
Agreed. I feel like it's more just a collection of good comments. It doesn't surprise me to see tptacek, patio11, etc there. I think the "prediction" aspect is under weighted.

But it reminds me that I miss Manishearth's comments! What ever happened to him? I recall him being a big rust contributor. I'd think he'd be all over the place, with rust's adoption since then. I also liked tokenadult. interesting blast from the past.

janalsncm 1 day ago|||
You would also need to exclude “predictions” for things which already happened at the time they were predicted.
xpe 1 day ago||
Good points. To summarize: for a given comment, one presumably must downselect to the ones that can reasonably be interpreted as forecasts. I see some indicators that the creator of the project (despite his amazing reputation) skated over this part.
nomel 1 day ago||
> I realized that this task is actually a really good fit for LLMs

I've found the opposite, since these models still fail pretty wildly at nuance. I think it's a conceptual "needle in the haystack sort of problem.

A good test is to find some thread where there's a disagreement and have it try to analyze the discussion. It will usually strongly misrepresent what was being said, by each side, and strongly align with one user, missing the actual divide that's causing the disagreement (a needle).

gowld 1 day ago|
As always, which model versions did you use in your test?
nomel 20 hours ago||
Claude Opus 4.5, Gemini 3 Pro, ChatGPT 5.1. Haven't tried ChatGPT 5.2.

It requires that the discussion has nuance, to see the failure. Gemini is, by far the, worst at this (which fits my suspicion that they heavily weighted reddit posts).

I don't think this is all that strange though. The human, on one side of the argument, is also missing the nuance, which is the source of the conflict. Is there a belief that AI has surpassed the average human, with conversational nuance!?

swalsh 1 day ago||
I have never felt less confident in the future than I do in 2025... and it's such a stark contrast. I guess if you split things down the middle, AI probably continues to change the world in dramatic ways but not in the all or nothing way people expect.

A non trivial amount of people get laid off, likely due to a finanical crisis which is used as an excuse for companies scale up use of AI. Good chance the financial crisis was partly caused by AI companies, which ironically makes AI cheaper as infra is bought up on the cheap (so there is a consolidation, but the bountiful infra keeps things cheap). That results in increased usage (over a longer period of time). and even when the economy starts coming back the jobs numbers stay abismal.

Politics are divided into 2 main groups, those who are employed, and those who are retired. The retired group is VERY large, and has alot of power. They mostly care about entitlements. The employed age people focus on AI which is making the job market quite tough. There are 3 large political forces (but 2 parties). The Left, the Right, and the Tech Elite. The left and the right both hate AI, but the tech elite though a minority has outsized power in their tie breaker role. The age distributions would surprise most. Most older people are now on the left, and most younger people are split by gender. The right focuses on limiting entitlements, and the left focuses on growing them by taxing the tech elite. The right maintains power by not threatening the tech elite.

Unlike the 20th century America is a more focused global agenda. We're not policing everyone, just those core trading powers. We have not gone to war with China, China has not taken over Taiwan.

Physical robotics is becoming a pretty big thing, space travel is becoming cheaper. We have at least one robot on an astroid mining it. The yield is trivial, but we all thought it was neat.

Energy is much much greener, and you wouln't have guessed it... but it was the data centers that got us there. The Tech elite needed it quickly, and used the political connections to cut red tape and build really quickly.

1121redblackgo 1 day ago||
We do not currently have the political apparatus in place to stop the dystopian nightmares depicted in movies and media. They were supposed to be cautionary tales. Maybe they still can be, but there are basically zero guardrails in non-progressive forms of government to prevent massive accumulations of power being wielded in ways most of the population disapproves of.
samdoesnothing 1 day ago||
Thats the whole point of democracy, to prevent the ruling parties from doing wildly unpopular things. Unlike a dictatorship, where they can do anything (including good things, that otherwise wouldn't happen in a democracy).

I know that "X is destroying democracy, vote for Y" has been a prevalent narrative lately, but is there any evidence that it's true? I get that it's death by a thousand cuts, or "one step at a time" as they say.

xpe 21 hours ago||
> I know that "X is destroying democracy, vote for Y" has been a prevalent narrative lately, but is there any evidence that it's true? I get that it's death by a thousand cuts, or "one step at a time" as they say.

I suggest reading [1], [2], and [3]. From there, you'll probably have lots of background to pose your own research questions. According to [4], until you write about something, your thinking will be incomplete, and I tend to agree nearly all of the time.

[1]: https://en.wikipedia.org/wiki/Democratic_backsliding

[2]: https://hub.jhu.edu/2024/08/12/anne-applebaum-autocracy-inc/

[3]: https://carnegieendowment.org/research/2025/08/us-democratic...

[4]: "Neuroscientists, psychologists and other experts on thinking have very different ideas about how our brains work, but, as Levy writes: “no matter how internal processes are implemented, (you) need to understand the extent to which the mind is reliant upon external scaffolding.” (2011, 270) If there is one thing the experts agree on, then it is this: You have to externalise your ideas, you have to write. Richard Feynman stresses it as much as Benjamin Franklin. If we write, it is more likely that we understand what we read, remember what we learn and that our thoughts make sense." - Sönke Ahrens. How to Take Smart Notes_ - Sonke Ahrens (p. 30)

Karrot_Kream 1 day ago||
Are you in the wrong thread?
dschnurr 1 day ago||
Nice! Something must be in the air – last week I built a very similar project using the historical archive of all-in podcast episodes: https://allin-predictions.pages.dev/
sanex 1 day ago|
I'll use this as evidence supporting my continued demand for a Friedberg only spinoff.
smugma 1 day ago|
I believe that the GPA calculation is off, maybe just for F's.

I scrolled to the bottom of the hall of fame/shame and saw that entry #1505 and 3 F's and a D, with an average grade of D+ (1.46).

No grade better than a D shouldn't average to a D+, I'd expect it to be closer to a 0.25.

More comments...