Remix.run Logo
sdesol 10 days ago

With humans we have a decent understanding of what they are capable of. I trust a medical professional to provide me with medical advice and an engineer to provide me with engineering advice. With LLM, it can be unpredictable at times, and they can make errors in ways that you would not imagine. Take the following examples from my tool, which shows how GPT-4o and Claude 3.5 Sonnet can screw up.

In this example, GPT-4o cannot tell that GitHub is spelled correctly:

https://app.gitsense.com/?doc=6c9bada92&model=GPT-4o&samples...

In this example, Claude cannot tell that GitHub is spelled correctly:

https://app.gitsense.com/?doc=905f4a9af74c25f&model=Claude+3...

I still believe LLM is a game changer and I'm currently working on what I call a "Yes/No" tool which I believe will make trusting LLMs a lot easier (for certain things of course). The basic idea is the "Yes/No" tool will let you combine models, samples and prompts to come to a Yes or No answer.

Based on what I've seen so far, a model can easily screw up, but it is unlikely that all will screw up at the same time.

visarga 9 days ago | parent [-]

It's actually a great topic - both humans and LLMs are black boxes. And both rely on patterns and abstractions that are leaky. And in the end it's a matter of trust, like going to the doctor.

But we have had extensive experience with humans, it is normal to have better defined trust, LLMs will be better understood as well. There is no central understander or truth, that is the interesting part, it's a "Blind men and the elephant" situation.

sdesol 9 days ago | parent [-]

We are entering the nondeterministic programming era in my opinion. LLM applications will be designed with the idea that we can't be 100% sure and what ever solution can provide the most safe guards, will probably be the winner.