Remix.run Logo
GuB-42 3 days ago

> Thus it's impossible to draw any meaningful conclusions. It would be similar to if I claimed that an LLM is an expert doctor, but in my data I've filtered out all of the times it gave incorrect medical advice.

Not really, you can try to make illegal moves in chess, and usually, you are given a time penalty and get to try again, so even in a real chess game, illegal moves are "filtered out".

And for the "medical expert" analogy, let's say that you compare to systems based on the well being of the patients after they follow the advise. I think it is meaningful even if you filter out advise that is obviously inapplicable, for example because it refers to non-existing body parts.