| ▲ | aembleton 2 hours ago | |||||||
> You’re literally just using three different slot machines and claiming one is hot. It's a fair point. I haven't tested many queries across them all and checked their answers, but if I want to ask one of them a question - right now its Grok just because I trust its answers more. | ||||||||
| ▲ | ToucanLoucan 2 hours ago | parent [-] | |||||||
It's not a methodology problem, it's a test-ability problem. LLMs are not deterministic. You can ask the same question to the same LLM five times and you'll likely get at least 3 answers. Again. Slot machine. | ||||||||
| ||||||||