Remix.run Logo
didibus 5 days ago

When tuning predictive models you always have to balance precision and recall because 100% accuracy is never going to happen.

In LLMs that balance shows up as how often the model hallucinates versus how often it says it doesn’t know. If you push toward precision you end up with a model that constantly refuses: What’s the X of Y? I don’t know. Can you implement a function that does K? I don’t know how. What could be the cause of G? I can’t say. As a user that gets old fast, you just want it to try, take a guess, let you be the judge of it.

Benchmarks and leaderboards usually lean toward recall because a model that always gives it a shot creates a better illusion of intelligence, even if some of those shots are wrong. That illusion keeps users engaged, which means more users and more money.

And that's why LLM hallucinates :P

Difwif 5 days ago | parent [-]

It would be interesting to see two versions of a model. A primary model tuned for precision that's focused on correctness that works with or orchestrates a creative model that's tuned for generating new (and potentially incorrect) ideas. The primary model is responsible for evaluating and reasoning about the ideas/hallucinations. Feels like a left/right brain architecture (even though that's an antiquated model of human brain hemispheres).