▲ | cainxinth 5 days ago | |
I find the leader board argument a little strange. All their enterprise clients are clamoring for more reliability from them. If they could train a model that conceded ignorance instead of guessing and thus avoid hallucinations, why aren't they doing that? Because of leader board optics? | ||
▲ | ospray 5 days ago | parent [-] | |
I think they are trying to communicate that their benchmarks will go down as they try to tackle hallucinations. Honestly I am surprised they didn't just say we think all benchmarks need a incorrect vs abstinence ratio so our cautious honest model can do well on that. Although they did seem to hint that's what they want. |