Remix clone Hacker News

new | show | ask | jobs Github

	▲	nextworddev 5 days ago
		Thanks. Is this mainly for verifiable tasks or any general task
	▲	ag8 5 days ago \| parent \| next [-]
		It's for any task that has an "eval", which is often verifiable tasks or ones that can be judged by LLMs (e.g. see [0]). There's also been recent work such as BRPO [1] and similar approaches to make more and more "non-verifiable" tasks have verifiable rewards! [0]: https://runrl.com/blog/funniest-joke [1]: https://arxiv.org/abs/2506.00103
	▲	-_- 5 days ago \| parent \| prev [-]
		There needs to be some way of automatically assessing performance on the task, though this could be with a Python function or another LLM as a judge (or a combination!)