| ▲ | silvertaza 6 hours ago | ||||||||||||||||||||||
Still huge hallucination rate, unfortunately at 86%. To compare, Opus sits at 36%. Source: https://artificialanalysis.ai/models?omniscience=omniscience... | |||||||||||||||||||||||
| ▲ | dubcanada 5 hours ago | parent | next [-] | ||||||||||||||||||||||
grok is 17%? And that's the lowest, most models are like 80%+? While hallucination is probably closer to 100% depending on the question. This benchmark makes no sense. | |||||||||||||||||||||||
| |||||||||||||||||||||||
| ▲ | simianwords 6 hours ago | parent | prev | next [-] | ||||||||||||||||||||||
There's something off with this because Haiku should not be that good. | |||||||||||||||||||||||
| |||||||||||||||||||||||
| ▲ | dakolli 5 hours ago | parent | prev [-] | ||||||||||||||||||||||
This indicates they want this behavior, they know the person asking the question probably doesn't understand the problem entirely (or why would they be asking), so they'd prefer a confident response, regardless of outcomes, because the point is to sell the technologies competency (and the perception thereof), not the capabilities, to a bunch of people that have no clue what they're talking about. LLMs will ruin your product, have fun trusting a billionaires thinking machine they swear is capable of replacing your employees if you just pay them 75% of your labor budget. | |||||||||||||||||||||||
| |||||||||||||||||||||||