Remix.run Logo
Reinforcement Learning from Human Feedback(arxiv.org)
56 points by onurkanbkrc 5 hours ago | 3 comments
klelatti 4 hours ago | parent [-]

Web version with links, etc:

https://rlhfbook.com/

verdverm 3 hours ago | parent [-]

Last time I saw Nathan say something about the book, he's actively working on the next version and looking for feedback, check his socials

leggerss an hour ago | parent [-]

You could say he's also learning from human feedback