Remix.run Logo
simianwords a day ago

How do you quantify generality? If we have a benchmark that can quantify it and that benchmark reliably tells us that the LLM is within human levels of generalisation then the llm is not distinguishable from a human.

While it’s a good point that we need to benchmark generalisation ability, you have in fact agreed that it is not important to understand underlying mechanics.

godelski 11 hours ago | parent [-]

That's kinda their point

The difference though is they understand that you can't just benchmark your way into proofs. Just like you can't unit test your way into showing code is error free. Benchmarks and unit tests are great tools that provide a lot of help, but just because a hammer is useful doesn't make everything a nail.