Posted by kcorbitt 10/28/2024
> Even if the model gets extremely good at predicting final_score_if_it_hits_front_page, there’s still the inherent randomness of probability_of_hitting_front_page that is fundamentally unpredictable.
In addition to date, you might want to include three fields:
- day of week (categorical)
- is weekend/holiday (boolean)
- hour or time of the day (categorical, you can have 24 of them or morning/afternoon/etc.).
The probability of a post hitting the front page is usually affected by these things so it can really help the model.
It's counterintuitive, but if you post at a really popular time, you're competing with a lot of other submissions. If you post at a really slow time, you'll get fewer votes, but it will take fewer to reach the front page and you'll have less competition.
In the end, it kinda evens out. The number of votes it takes to get to the front page and the number of competing submissions are both correlated to your fields above.
Somehow this reminded me of someone datamining spiegel.de (german news site) and using the timestamps of the posted articles to extrapolate the writers religion (holidays) and relationships (shared vacations) among dozens of other data points from several years of publicly available data. I think no AI was involved back then.
https://media.ccc.de/v/33c3-7912-spiegelmining_reverse_engin...
This has been studied multiple times on HN posts, most seem to have link-rotted. Web Archive them if looking for insights - https://hn.algolia.com/?q=best+time+to+post
* 1 had a score that was reasonably close (8.4%) to what the model predicted
* 4 had scores wildly lower than the model predicted
* 2 had scores wildly higher than the model predicted
* the remaining 3 were not wildly off, but weren't really that close either (25%-42% off)
Then there's a list of 10 submissions that the model predicted would have scores ranging from 33 to 135, but they all only received a score of 1 in reality.
The graph shown paints a bit of a better picture, I guess, but it's still not all that compelling to me.
Broadly, the main use case for this model (in the RL context) will be to take two different versions of the same post, and predict which of the two is more likely to be upvoted. So what matters isn't that it gets the exact number of upvotes correctly, but that it correctly predicts the relative difference in likely upvote count between two variants.
Now it still doesn't do a great job at that (the correlation is only 0.53 after all) but it still does a good enough job to provide some useful signal.
But the number of comments depends on the time posted more than the story itself and that information isn't in the model.
> The correlation is actually not bad (0.53), but our model is very consistently over-estimating the score at the low end, and underestimating it at the high end. This is surprising; some variation on any given data point is expected, but such a consistent mis-estimation trend isn’t what we’d expect.
This is a consequence on the model objective. If you don't know what is really happening, a good way of reducing the overall error is to do that. If you instead try to exactly predict the very highs and very lows, you can see that you will get very high errors on those, resulting in a bigger overall error.
Appart from that, I want to comment on AI alignment here. For me the objective of "most up votes" is not fully correlated with where I get the most value on HN. Most of the time, the most up voted I would have found them anyway on other platforms. It's the middle range what I really like. So be careful implementing this algorithm at scale, it could turn the website into another platform with shitty AI recommendations.
Yes, this is a fantastic point. I'm curious if there's some other measurable proxy metric for "things I get the most value out of on HN"? Upvotes seems like the most natural but optimizing for it too strongly would definitely take HN down a dark path.
https://scikit-learn.org/dev/modules/generated/sklearn.isoto...
I also agree with your intuition that if your output is censored at 0, with a large mass there, it's good to create two models, one for likelihood of zero karma, and another expected karma, conditional on it being non-zero.
> it's good to create two models, one for likelihood of zero karma, and another expected karma, conditional on it being non-zero.
Another way to do this is to keep a single model but have it predict two outputs: (1) likelihood of zero karma, and (2) expected karma if non-zero. This would require writing a custom loss function which sounds intimidating but actually isn't too bad.
If I were actually putting a model like this into production at HN I'd likely try modeling the problem in that way.
The reason I think of this as censoring is that there are are some classical statistical models that model a distribution with a large mass at a minimum threshold, e.g. "tobit" censored regression.
(Fully dictated, no edits except for this)
Based on the later analysis in the post (which I agree with), the total score of a comment is disproportionately tied to whether it hits the front page, and of course how long it stays there. Regardless of the quality of the average post starting in 2015, the sheer quantity would make it impossible for all but a few to stay on the front page for very long. Hacker News got more popular, so each story got less prime time.
What is your take on this?
I generally find these posts pretty boring, and most comments on them are people recounting their own stories about how that (or a similar) service screwed them over. I suppose they can be a decent way to warn people off of a particular product (scammy, terrible customer support, whatever), but that's not what I come to HN for.
Model correlation is decent here but there's certainly more to do to use its outputs predictively.
My point is I don't think people seek out outrage. Social media's algorithms may not explicitly reward it as transparently as `if (post.outrage > 100) post.boost()`, but outrage isn't some default rule of interaction.
Give people the way to repost / retweet / boost, and your feed suddenly turns into mostly negativity, even if your algorithm is "show posts from my followers only, newest to oldest"
See also https://en.wikipedia.org/wiki/Negativity_bias
We're just built like that.
Regarding text platforms suffering more than non-text platforms, I think it's because of the lack of social cues that are otherwise there. You can infer a lot from the way someone talks, or from their body language. You can't infer much from text, which is partly why Poe's law exists -- sarcasm doesn't translate well.
It was definitely there. Plenty of forums had "rant threads" that were efforts to quarantine shitty reactionary behavior like this. Also a lot of the healthier forums were smaller forums. I was on plenty of forums that had 10-20 folks on them that today would just be a Telegram group chat or a small Discord "server". These small spaces tend to be a lot lower on toxicity than larger fora. I was part of a few large fora like Gaia Online and they were just as toxic as today's large platforms. Managing large communities with chronological posting is really difficult and upvote based social networks were the first real networks to be able to scale to larger userbases without having hundreds of moderators (like Gaia or the large MUDs.)
> What about 4chan?
4chan is immune because the default emotional register there is indignant dismissal. Because of this it's just a matter of choosing what else to layer ontop of the indignant dismissal, like sarcasm or anger or whatnot.
> Regarding text platforms suffering more than non-text platforms, I think it's because of the lack of social cues that are otherwise there. You can infer a lot from the way someone talks, or from their body language. You can't infer much from text, which is partly why Poe's law exists.
That's an interesting theory actually. My theory was that in the age of multimedia platforms, text platforms tend to attract folks who specifically want to use text over multimedia. Generally text forums will select for folks with social or self-esteem issues. These folks are the least likely to healthily deal with their emotions or disengage positively. This leads to higher toxicity on text based platforms.
Some people like to take time to compose thoughts in written form because that is generally the best way to communicate thoughtfully. You can say what you will about a lack of body language, but plenty of people get into verbal fights in person and it doesn't help that they end up talking over each other.
I think that your assertion that people who communicate via text have social issues is without evidence and is reductive.
You could say that people who enjoy looking at themselves and hearing themselves enough to edit their footage and post it online have ego issues and are less likely to listen to what others have to say.
The direction of my implication comes from observation: text communities tend to all descend into toxicity (observation) -> why does this happen in text communities moreso than non-text communities? (question) -> higher proportion of socially maladapted people (theory). You might well be correct that people who enjoy looking and hearing themselves and have ego issues are the ones that prefer (compose a higher proportion thereof) multimedia social networks. I don't disagree with you, either. That's beside the point. The point is that most text communities tend to descend into toxicity.
Humans aren't perfect and if I'm in a positive community of high egos, I'd much prefer that than a toxic community with "normal" egos.
So I want to zoom in on this:
> Some people like to take time to compose thoughts in written form because that is generally the best way to communicate thoughtfully. You can say what you will about a lack of body language, but plenty of people get into verbal fights in person and it doesn't help that they end up talking over each other.
We're talking about social networks here not real life, because social networks deal with the fundamentally different problem. In a social network (yes this includes IRC) you are interacting with a number of people whom you do not share any real-world context with, whom you do not share any physical space with, and whom generally have a much lower stake in their relationships because of the lack of shared context.
In my experience all textual social networks that grow beyond a certain number of users descend into toxicity: Usenet, IRC (old Freenode and Rizon), Slashdot, Digg, Reddit, HN, Youtube Comments, Nextdoor, Local News Comments, Twitter/X, etc. I think "algorithms" (including counting upvotes) have reduced the moderation burden and allowed social sites to scale much higher than they could before algorithms.
Text communities all eventually collapse into ranting, bullying, hot takes, moral outrage, zealotry, and negativity. I'm open to any and all theories about why this is but I find this specific to text-based communities: Twitch, Instagram and TikTok have so much less of it for example. I think the idea that text leads to thoughtful communication was a hypothesis advanced first during the Usenet era and later during the blogging era but ended up being disproven. I think there's a nostalgia of the pre-media web that pervades these discussions that prevent text-fans from realizing at a macro level that the toxicity that was on comp.lang.lisp is the same toxicity in HN comments and is toxicity that just isn't there on most of Instagram, for better or for worse.
I actually think this identity around being a "text person" is part of the problem. The moment you wrap your identity around something you become both proud and protective of it. For some things this is fine, but if your preferred media itself becomes part of your identity, then you're going to have a blind spot around what makes your preferred social media different from the others.
What exactly is a 'multimedia community' anyway? You haven't defined it. Is it just tik tok?
If you want another perspective on my point, take a look at https://www.reddit.com/r/slatestarcodex/comments/9rvroo/most...
Have a nice day.
The fact that you cannot even engage to answer what a multimedia community is without claiming that I am acting in bad faith in order to jump out of an escape hatch is telling.
Your lack of self-awareness is astonishing.
I'd be interested in any sort of evidence that supports this
Yeah that's very plausible indeed
I'm sure it's just human psyche but I'm trying to overcome it and make my life more positive again
If the reward model is indeed smart enough to be able to take that into account you could actually use it to plan the optimal time of day to post a specific story! You could just use the reward model to compute a predicted score for 8 different versions of your content, holding the post title/text constant across them all and just changing the date. Based on the differences in scores, you can determine which posting time the RM thinks is most likely to make your post successful!
You see this on Reddit pretty commonly.
Someone posts original content at an off time and get a small/moderate amount of upvotes. Then some time later (could be hours, days, or weeks) a bot/karma account will post the content at an optimal time to farm upvotes.