| ▲ | ToucanLoucan 2 hours ago | |
It's not a methodology problem, it's a test-ability problem. LLMs are not deterministic. You can ask the same question to the same LLM five times and you'll likely get at least 3 answers. Again. Slot machine. | ||
| ▲ | Ukv 2 hours ago | parent [-] | |
You can meaningfully test if one slot machine hits the jackpot more often than another, just that the methodology should involve a large number of repeats rather than a few anecdotes. There are some LLM leaderboard sites that do it with blind comparisons. | ||