Remix.run Logo
whimsicalism 6 days ago

It is still an open question whether RL will (at least easily) scale the same way as pretrain or whether it is more effective at elicitation.