Remix.run Logo
littlestymaar 3 days ago

It's good way to assess the model with respect to hallucinations though.

I don't think a model should know the answer, but it must be able to know that it doesn't know if you want to use it reliably.

esafak 3 days ago | parent [-]

No model is good at this yet. I'd expect the flagships to solve the first.