▲ | nextworddev 5 days ago | |
Thanks. Is this mainly for verifiable tasks or any general task | ||
▲ | ag8 5 days ago | parent | next [-] | |
It's for any task that has an "eval", which is often verifiable tasks or ones that can be judged by LLMs (e.g. see [0]). There's also been recent work such as BRPO [1] and similar approaches to make more and more "non-verifiable" tasks have verifiable rewards! | ||
▲ | -_- 5 days ago | parent | prev [-] | |
There needs to be some way of automatically assessing performance on the task, though this could be with a Python function or another LLM as a judge (or a combination!) |