It is still an open question whether RL will (at least easily) scale the same way as pretrain or whether it is more effective at elicitation.