| ▲ | whatshisface 6 hours ago | |||||||
RL is barely even a training method, its more of a dataset generation method. | ||||||||
| ▲ | theOGognf 5 hours ago | parent [-] | |||||||
I feel like both this comment and the parent comment highlight how RL has been going through a cycle of misunderstanding recently from another one of its popularity booms due to being used to train LLMs | ||||||||
| ||||||||