| ▲ | goldenarm 4 hours ago | |||||||
The non-hallucination rate in AA-omniscience is SOTA, better than Opus 4.7, Gemini 3.1 Pro and GPT5.5! Congrats to the team | ||||||||
| ▲ | gslepak 2 hours ago | parent | next [-] | |||||||
> The non-hallucination rate in AA-omniscience is SOTA Note that a perfect "non-hallucination rate" is rather meaningless as such tests can contain human hallucinations. It means the model aligns with the possibly-true, possibly-false beliefs of the group that made the test. | ||||||||
| ||||||||
| ▲ | throawayonthe 3 hours ago | parent | prev | next [-] | |||||||
referencing this: https://artificialanalysis.ai/evaluations/omniscience?models... (had to add it to the chart, wasn't displayed by default. is it the lowest rate in the datasetor no?) | ||||||||
| ▲ | sheepscreek 3 hours ago | parent | prev | next [-] | |||||||
Truly incredible! Very impressed by their progress. I wonder how much of their own chips did they use for training. | ||||||||
| ▲ | baq 2 hours ago | parent | prev [-] | |||||||
wonder at which level there's a capability state transition? 5%? 1%? | ||||||||