| ▲ | zug_zug 2 hours ago | |
Well then it shows that these models are using widely disparate training sets and have high confidence even when they shouldn't. Questions like "is mouthwash effective" presumably has one solid data source -- medical journals. | ||
| ▲ | simonw 2 hours ago | parent | next [-] | |
But the prompt didn't give the models the option to say "I don't know", so it wasn't a measure of their confidence. | ||
| ▲ | TaupeRanger an hour ago | parent | prev [-] | |
What are you talking about? The models were not ALLOWED to have confidence (or the lack thereof). They were explicitly told to give a single label, and in most cases, all of them were correct depending on additional context they would surely have provided, especially with access to the internet (which some didn't have). This is just silly. | ||