Remix.run Logo
ijk 3 days ago

I think what trips people up is that LLMs and humans are both lossy, but in different ways.

The intuitions that we've developed around previous interactions are very misleading when applied to LLMs. When interacting with a human, we're used to being able to ask a question about topic X in context Y and assume that if you can answer it we can rely on you to be able to talk about it in the very similar context Z.

But LLMs are bad at commutative facts; A=B and B=A can have different performance characteristics. Just because it can answer A=B does not mean it is good at answering B=A; you have to test them separately.

I've seen researchers who should really know better screw this up, rendering their methodology useless for the claim they're trying to validate. Our intuition for how humans do things can be very misleading when working with LLMs.