Remix clone Hacker News

new | show | ask | jobs Github

	▲	tjungblut 4 days ago
		If you are curios, like me, how the actual reinforcement learning happens. It uses verl [1] underneath. The paper "HybridFlow: A Flexible and Efficient RLHF Framework" [2] explains it really well. [1] https://github.com/volcengine/verl [2] https://arxiv.org/abs/2409.19256v2