Remix.run Logo
bunderbunder 12 hours ago

> The real question for me is: are they less reliable than human judges?

I've spent some time poking at this. I can't go into details, but the short answer is, "Sometimes yes, sometimes no, and it depends A LOT on how you define 'reliable'."

My sense is that, the more boring, mechanical and closed-ended the task is, the more likely an LLM is to be more reliable than a human. Because an LLM is an unthinking machine. It doesn't get tired, or hangry, or stressed out about its kid's problems at school. But it's also a doofus with absolutely no common sense whatsoever.

visarga 12 hours ago | parent [-]

> Because an LLM is an unthinking machine.

Unthinking can be pretty powerful these days.