▲ | Workaccount2 5 days ago | |||||||
The best benchmark is the community vibe in the weeks following a release. Claude benchmarks poorly but vibes well. Gemini benchmarks well and vibes well. Grok benchmarks well but vibes poorly. (yes I know you are gushing with anecdotes, the vibes are simply the approximate color of gray born from the countless black and white remarks.) | ||||||||
▲ | diggan 4 days ago | parent | next [-] | |||||||
> The best benchmark is the community vibe in the weeks following a release. True, just be careful what community you use as a vibe-check. Most of the mainstream/big ones around AI and LLMs basically have influence campaigns run against them, are made of giant hive-minds that all think alike and you need to carefully asses if anything you're reading is true or not, and votes tend to make it even worse. | ||||||||
| ||||||||
▲ | wubrr 5 days ago | parent | prev [-] | |||||||
the vibes are just a collection anecdotes | ||||||||
|