Top
Best
New

Posted by __rito__ 10 hours ago

Auto-grading decade-old Hacker News discussions with hindsight(karpathy.bearblog.dev)
Related from yesterday: Show HN: Gemini Pro 3 imagines the HN front page 10 years from now - https://news.ycombinator.com/item?id=46205632
311 points | 150 commentspage 2
MBCook 9 hours ago|
#272, I got a B+! Neat.

It would be very interesting to see this applied year after year to see if people get better or worse over time in the accuracy of their judgments.

It would also be interesting to correlate accuracy to scores, but I kind of doubt that can be done. Between just expressing popular sentiment and the first to the post people getting more votes for the same comment than people who come later it probably wouldn’t be very useful data.

pjc50 7 hours ago|
#250, but then I wasn't trying to make predictions for a future AI. Or anyone else, really. Got a high score mostly for status quo bias, e.g. visual languages going nowhere and FPGAs remain niche.
embedding-shape 4 hours ago||
Yeah, it be much more interesting to see the people who made (at the time) outrageous claims, but they came to be true, rather than a list of people who could state that the status quo most likely would stay as it is.
scosman 7 hours ago||
Anyone have a branch that I can run to target my own comments? I'd love to see where I was right and where I was off base. Seems like a genuinely great way to learn about my own biases.
xpe 1 minute ago|
[delayed]
moultano 9 hours ago||
Notable how this is only possible because the website is a good "web citizen." It has urls that maintain their state over a decade. They contain a whole conversation. You don't have to log in to see anything. The value of old proper websites increases with our ability to process them.
chrisweekly 9 hours ago||
Yes! See "Cool URIs Don't Change"^1 by Sir TBL himself.

1. https://www.w3.org/Provider/Style/URI

dietr1ch 8 hours ago|||
> because the website is a good "web citizen." It has urls that maintain their state over a decade.

It's a shame that maintaining the web is so hard that only a few websites are "good citizens". I wish the web was a -bit- way more like git. It should be easier to crawl the web and serve it.

Say, you browse and get things cached and shared, but only your "local bookmarks" persist. I guess it's like pinning in IPFS.

moultano 7 hours ago|||
Yes, I wish we could serve static content more like bittorent, where your uri has an associate hash, and any intermediate router or cache could be an equivalent source of truth, with the final server only needing to play a role if nothing else has it.

It is not possible right now to make hosting democratized/distributed/robust because there's no way for people to donate their own resources in a seamless way to keeping things published. In an ideal world, the internet archive seamlessly drops in to serve any content that goes down in a fashion transparent to the user.

oncallthrow 6 hours ago||
This is IPFS
shpx 53 minutes ago|||
In my experience from the couple of times I clicked an IPFS link years ago, it loaded for a long time and never actually loaded anything, failing the very first "I wish we could serve static content" part.

If you make it possible for people to donate bandwidth you might just discover no one wants to donate bandwidth.

drdec 7 hours ago||||
> It's a shame that maintaining the web is so hard that only a few websites are "good citizens"

It's not hard actually. There is a lack of will and forethought on the part of most maintainers. I suspect that monetization also plays a role.

DANmode 7 hours ago|||
Let Reddit and friends continue to out themselves for who they are.

Keeps the spotlight on carefully protected communities like this one.

jeffbee 8 hours ago||
There are things that you have to log in to see, and the mods sometimes move conversations from one place to another, and also, for some reason, whole conversations get reset to a single timestamp.
embedding-shape 8 hours ago|||
> and the mods sometimes move conversations from one place to another

This only manipulates the children references though, never the item ID itself. So if you have the item ID of an item (submission, comment, poll, pollItem), it'll be available there as long as moderators don't remove it, which happens very seldom.

latexr 8 hours ago|||
> for some reason, whole conversations get reset to a single timestamp.

What do you mean?

embedding-shape 8 hours ago|||
Submissions put in the second-chance pool briefly appear (sometimes "again") on the frontpage, and the conversation timestamps are reset so it appears like they were written after the second-chance submission, not before.
Y_Y 6 hours ago||
I never noticed that. What a weird lie!

I suppose they want to make the comments seem "fresh" but it's a deliberate misrepresentation. You could probably even contrive a situation where it could be damaging, e.g. somebody says something before some relevant incident, but the website claims they said it afterwards.

embedding-shape 5 hours ago||
I think the reason is much simpler than that. Resetting the timestamp lets them easily resurface things on the frontpage, because the current time - posting time delta becomes a lot smaller, so it's again ranked higher. And avoiding adding a special case, lets the rest of the codebase work exactly like it was before, basically just need to add a "set submission time to now" function and you get the rest for free.

But, I'm just guessing here based on my own refactoring experience through the years, may be a completely different reason, or even by mistake? Who knows? :)

jeffbee 8 hours ago|||
There is some action that moderators can take that throws one of yesterday's articles back on the front page and when that happens all the comments have the same timestamp.
consumer451 8 hours ago||
I believe that this is called "the second chance pool." It is a bit strange when it unexpectedly happens to one's own post.
jacquesm 6 hours ago||
Predictions are only valuable when they're actually made ahead of the knowledge becoming available. A man will walk on mars by 2030 is falsifiable, a man will walk on mars is not. A lot of these entries have very low to no predictive value or were already known at the time, but just related. Would be nice if future 'judges' put in more work to ensure quality judgments.

I would grade this article B-, but then again, nobody wrote it... ;)

npunt 1 hour ago||
One of the few use cases for LLMs that I have high hopes for and feel is still under appreciated is grading qualitative things. LLMs are the first tech (afaik) that can do top-down analysis of phenomena in a manner similar to humans, which means a lot of important human use cases that are judgement-oriented can become more standardized, faster, and more readily available.

For instance, one of the unfortunate aspects of social media that has become so unsustainable and destructive to modern society is how it exposes us to so many more people and hot takes than we have ability to adequately judge. We're overwhelmed. This has led to conversation being dominated by really shitty takes and really shitty people, who rarely if ever suffer reputational consequence.

If we build our mediums of discourse with more reputational awareness using approaches like this, we can better explore the frontier of sustainable positive-sum conversation at scale.

Implementation-wise, the key question is how do we grade the grader and ensure it is predictable and accurate?

ComputerGuru 6 hours ago||
Looking at the results and the prompt, I would tweak the prompt to

* ignore comments that do not speculate on something that was unknown or had not achieved consensus as of the date of yyyy-mm-dd

* at the same time, exclude speculations for which there still isn’t a definitive answer or consensus today

* ignore comments that speculate on minor details or are stating a preference/opinion on a subjective matter

* it is ok to generate an empty list of users for a thread if there are no comments meeting the speculation requirements laid out above

* etc

losvedir 2 hours ago||
Agreed. I feel like it's more just a collection of good comments. It doesn't surprise me to see tptacek, patio11, etc there. I think the "prediction" aspect is under weighted.

But it reminds me that I miss Manishearth's comments! What ever happened to him? I recall him being a big rust contributor. I'd think he'd be all over the place, with rust's adoption since then. I also liked tokenadult. interesting blast from the past.

janalsncm 6 hours ago|||
You would also need to exclude “predictions” for things which already happened at the time they were predicted.
xpe 2 hours ago||
Good points. To summarize: for a given comment, one presumably must downselect to the ones that can reasonably be interpreted as forecasts. I see some indicators that the creator of the project (despite his amazing reputation) skated over this part.
sigmar 4 hours ago||
Gotta auto grade every HN comment for how good it is at predicting stock market movement then check what the "most frequently correct" user is saying about the next 6 months.
Rychard 4 hours ago||
As the saying goes, "past performance is not indicative of future results"
xpe 2 hours ago||
I hope this is a joke.

Forecasting and the meta-analysis of forecasters is fairly well studied. [1] is a good place to start.

[1]: https://en.wikipedia.org/wiki/Superforecaster

sigmar 2 hours ago||
> The conclusion was that superforecasters' ability to filter out "noise" played a more significant role in improving accuracy than bias reduction or the efficient extraction of information.

>In February 2023, Superforecasters made better forecasts than readers of the Financial Times on eight out of nine questions that were resolved at the end of the year.[19] In July 2024, the Financial Times reported that Superforecasters "have consistently outperformed financial markets in predicting the Fed's next move"

>In particular, a 2015 study found that key predictors of forecasting accuracy were "cognitive ability [IQ], political knowledge, and open-mindedness".[23] Superforecasters "were better at inductive reasoning, pattern detection, cognitive flexibility, and open-mindedness".

I'm really not sure what you want me to take from this article? Do you contend that everyone has the same competency at forecasting stock movements?

dw_arthur 3 hours ago||
Reading this I feel the same sense of dread I get watching those highly choreographed Chinese holiday drone shows.
SequoiaHope 4 hours ago||
This is great! Now I want to run this to analyze my own comments and see how I score and whether my rhetoric has improved in quality/accuracy over time!
dschnurr 6 hours ago|
Nice! Something must be in the air – last week I built a very similar project using the historical archive of all-in podcast episodes: https://allin-predictions.pages.dev/
sanex 6 hours ago|
I'll use this as evidence supporting my continued demand for a Friedberg only spinoff.
More comments...