Remix.run Logo
goldenarm 4 hours ago

The non-hallucination rate in AA-omniscience is SOTA, better than Opus 4.7, Gemini 3.1 Pro and GPT5.5! Congrats to the team

gslepak 2 hours ago | parent | next [-]

> The non-hallucination rate in AA-omniscience is SOTA

Note that a perfect "non-hallucination rate" is rather meaningless as such tests can contain human hallucinations.

It means the model aligns with the possibly-true, possibly-false beliefs of the group that made the test.

rlt 2 hours ago | parent [-]

Well, yes, garbage in garbage out. That's a given and not what's meant by "hallucination" in this context.

throawayonthe 3 hours ago | parent | prev | next [-]

referencing this:

https://artificialanalysis.ai/evaluations/omniscience?models...

(had to add it to the chart, wasn't displayed by default. is it the lowest rate in the datasetor no?)

sheepscreek 3 hours ago | parent | prev | next [-]

Truly incredible! Very impressed by their progress. I wonder how much of their own chips did they use for training.

baq 2 hours ago | parent | prev [-]

wonder at which level there's a capability state transition? 5%? 1%?