Remix.run Logo
whatshisface 6 hours ago

RL is barely even a training method, its more of a dataset generation method.

theOGognf 5 hours ago | parent [-]

I feel like both this comment and the parent comment highlight how RL has been going through a cycle of misunderstanding recently from another one of its popularity booms due to being used to train LLMs

mistercheph 3 hours ago | parent [-]

care to correct the misunderstanding?