Remix.run Logo
Terr_ 11 hours ago

> The real question for me is: are they less reliable than human judges?

I'd caution that it's never just about ratios: We must also ask whether the "shape" of their performance is knowable and desirable. A chess robot's win-rate may be wonderful, but we are unthinkingly confident a human wouldn't "lose" by disqualification for ripping off an opponent's finger.

Would we accept a "judge" that is fairer on average... but gives ~5% lighter sentences to people with a certain color shirt, or sometimes issues the death-penalty for shoplifting? Especially when we cannot diagnose the problem or be sure we fixed it? (Maybe, but hopefully not without a lot of debate over the risks!)

In contrast, there's a huge body of... of stuff regarding human errors, resources we deploy so pervasively it can escape our awareness: Your brain is a simulation and diagnostic tool for other brains, battle-tested (sometimes literally) over millions of years; we intuit many kinds of problems or confounding factors to look for, often because we've made them ourselves; and thousands of years of cultural practice for detection, guardrails, and error-compensating actions. Only a small minority of that toolkit can be reused for "AI."