| ▲ | sabareesh 5 days ago | |||||||||||||
Watch out these model are hallucinating lot more https://artificialanalysis.ai/evaluations/omniscience?omnisc... | ||||||||||||||
| ▲ | joecarpenter 5 days ago | parent | next [-] | |||||||||||||
Isn't it the opposite? From the link: Scores range from -100 to 100, where 0 means as many correct as incorrect answers, and negative scores mean more incorrect than correct. Gemini 3 Flash scored +13 in the test, more correct answers than incorrect. | ||||||||||||||
| ||||||||||||||
| ▲ | andai 5 days ago | parent | prev [-] | |||||||||||||
This model has the best score on that benchmark. Edit: Huh... It does score highest in "Omniscience", but also very high in Hallucination Rate (where higher score is worse)... | ||||||||||||||
| ||||||||||||||