Remix clone Hacker News
new
|
show
|
ask
|
jobs
Github
▲
at2005
5 hours ago
Ah, I meant that MCTS uses more inference-time compute (over GRPO) to
produce
a training sample