Using reinforcement learning and $4.80 of GPU time to find the best HN post

Posted by kcorbitt 10/28/2024

Using reinforcement learning and $4.80 of GPU time to find the best HN post(openpipe.ai)

217 points | 95 commentspage 3

suyash 10/28/2024|

Very interesting project, would love to read a more technical write up on how the model was architected and trained, any pointers?

kcorbitt 10/28/2024|

I link to it from the post, but all the code is open source! You can find the specific training script here: https://github.com/OpenPipe/best-hn/blob/main/stories_train_...

And all the graphs for the blog are from this notebook: https://github.com/OpenPipe/best-hn/blob/main/blog-figures.i...

Lots of other good stuff in that repo, although it's only organized to a "working researcher" standard I'm afraid.

octocop 10/29/2024||

Even the AI's don't read the content before up/down voting.

floobertoober 10/28/2024||

Maybe it would help to use a box cox transform on the score distribution?

chx 10/28/2024||

> . That’s not much time for a model that (hopefully) understands all of HN!

this is dangerous talk.

it doesn't understand anything at all.

Reminder: We are more prone to anthromorphizing LLMs than to humanizing suffering humans.

ChrisArchitect 10/28/2024||

First problem with the submissions that supposedly 'would do well on HN' is other than the Ask HN: they're misusing the submission by putting it in a text post instead of sharing as a link post directly. And sketchy new/inactive accounts. C'mon. Not gonna keep reading grifty post after that opening.

ivanovm 10/29/2024|

this is very cool, have you tried DPO?