Remix.run Logo
LoganDark 6 hours ago

Much of the space of artificial intelligence is based on a goal of a general reasoning machine comparable to the reasoning of a human. There are many subfields that are less concerned with this, but in practice, artificial intelligence is perceived to have that goal.

I am sure the output of current frontier models is convincing enough to outperform the appearance of humans to some. There is still an ongoing outcry from when GPT-4o was discontinued from users who had built a romantic relationship with their access to it. However I am not convinced that language models have actually reached the reliability of human reasoning.

Even a dumb person can be consistent in their beliefs, and apply them consistently. Language models strictly cannot. You can prompt them to maintain consistency according to some instructions, but you never quite have any guarantee. You have far less of a guarantee than you could have instead with a human with those beliefs, or even a human with those instructions.

I don't have citations for the objective reliability of human reasoning. There are statistics about unreliability of human reasoning, and also statistics about unreliability of language models that far exceed them. But those are both subjective in many cases, and success or failure rates are actually no indication of reliability whatsoever anyway.

On top of that, every human is different, so it's difficult to make general statements. I only know from my work circles and friend circles that most of the people I keep around outperform language models in consistency and reliability. Of course that doesn't mean every human or even most humans meet that bar, but it does mean human-level reasoning includes them, which raises the bar that models would have to meet. (I can't quantify this, though.)

There is a saying about fully autonomous self driving vehicles that goes a little something like: they don't just have to outperform the worst drivers; they have to outperform the best drivers, for it to be worth it. Many fully autonomous crashes are because the autonomous system screwed up in a way that a human would not. An autonomous system typically lacks the creativity and ingenuity of a human driver.

Though they can already be more reliable in some situations, we're still far from a world where autonomous driving can take liability for collisions, and that's because they're not nearly as reliable or intelligent enough to entirely displace the need for human attention and intervention. I believe Waymo is the closest we've gotten and even they have remote safety operators.

throwway120385 2 hours ago | parent | next [-]

It's not enough for them to be "better" than a human. When they fail they also have to fail in a way that is legible to a human. I've seen ML systems fail in scenarios that are obvious to a human and succeed in scenarios where a human would have found it impossible. The opposite needs to be the case for them to be generally accepted as equivalent, and especially the failure modes need to be confined to cases where a human would have also failed. In the situations I've seen, customers have been upset about the performance of the ML model because the solution to the problem was patently obvious to them. They've been probably more upset about that than about situations where the ML model fails and the end customer also fails.

gaigalas 5 hours ago | parent | prev [-]

That's not a citation.

LoganDark 4 hours ago | parent [-]

It's roughly why I think this way, along with a statement that I don't have objective citations. So sure, it's not a citation. I even said as much, right in the middle there.