| ▲ | user_7832 an hour ago | |||||||
> If they miss a word they never do unintelligible, they just start playing madlibs based on the rest of the sentence. Imo this is the single biggest flaw of LLMs. They're great at a lot of things, but knowing when they're wrong (or don't have enough information to actually work on) is a critical flaw. IMO there's nothing structural about why they shouldn't be able to spot this and correct themselves - I suspect it's a training issue. But presumably bots that infer context/fill in the dots rank better on what people like... at the cost of accuracy. | ||||||||
| ▲ | r_lee 33 minutes ago | parent | next [-] | |||||||
I don't think it's a training issue, it's simply that there's no inherent "I don't know" in the transformer architecture unless it's really like something completely unknown, otherwise the nearest neighbor will be chosen and that will be whatever sounds similar or is relevant, even if it might cause a problem | ||||||||
| ||||||||
| ▲ | moffkalast 9 minutes ago | parent | prev [-] | |||||||
It's a benchmark and eval issue. Guessing gets them the right result sometimes and the models rank better in error rate than they'd otherwise. We need the kind of benchmarks that penalize being wrong WAY more than saying "I don't know". Of course there's a secondary problem that the model may then overuse the unintelligible option, but that's something that's a matter of training them properly against that eval. You could also try thresholding the output based on perplexity to remove the parts that the model is less sure about, but that's not going to be super accurate I think. | ||||||||