| ▲ | beering 4 hours ago | |
My experience has been that this isn’t generally true, mainly because worse models pursue red herrings or get confused and stuck. a better model will get to the correct solution in fewer tokens, and my surface-level understanding of how RL works supports this. | ||