| ▲ | godelski 2 days ago | |
Correct. And this happens because training metrics are not aligned with training intent.
And this will be a natural consequence of the above. To help clarify it's like taking a math test where one grader looks at the answer while another looks at the work and gives partial credit. Who is doing a better job at measuring successful leaning outcomes? It's the latter. In the former you can make mistakes that cancel out or you can just more easily cheat. It's harder to cheat in the latter because you'd need to also reproduce all the steps and at that point are you even cheating?A common example of this is where the LLM gets the right answer but all the steps are wrong. An example of this can actually be seen in one of Karpathy's recent posts. It gets the right result but the math is all wrong. This is no different than deception. It is deception because it tells you a process and it's not correct. | ||