Top
Best
New

Posted by kcorbitt 10/28/2024

Using reinforcement learning and $4.80 of GPU time to find the best HN post(openpipe.ai)
217 points | 95 commentspage 3
suyash 10/28/2024|
Very interesting project, would love to read a more technical write up on how the model was architected and trained, any pointers?
kcorbitt 10/28/2024|
I link to it from the post, but all the code is open source! You can find the specific training script here: https://github.com/OpenPipe/best-hn/blob/main/stories_train_...

And all the graphs for the blog are from this notebook: https://github.com/OpenPipe/best-hn/blob/main/blog-figures.i...

Lots of other good stuff in that repo, although it's only organized to a "working researcher" standard I'm afraid.

octocop 10/29/2024||
Even the AI's don't read the content before up/down voting.
floobertoober 10/28/2024||
Maybe it would help to use a box cox transform on the score distribution?
chx 10/28/2024||
> . That’s not much time for a model that (hopefully) understands all of HN!

this is dangerous talk.

it doesn't understand anything at all.

Reminder: We are more prone to anthromorphizing LLMs than to humanizing suffering humans.

ChrisArchitect 10/28/2024||
First problem with the submissions that supposedly 'would do well on HN' is other than the Ask HN: they're misusing the submission by putting it in a text post instead of sharing as a link post directly. And sketchy new/inactive accounts. C'mon. Not gonna keep reading grifty post after that opening.
ivanovm 10/29/2024|
this is very cool, have you tried DPO?