Remix.run Logo
at2005 5 hours ago

Ah, I meant that MCTS uses more inference-time compute (over GRPO) to produce a training sample