| ▲ | scottyah 3 hours ago | |
If you don't spend any time comparing models to the point where you don't know about benchmarks, why do you care where people think the line for SOTA is? | ||
| ▲ | mikkupikku 3 hours ago | parent [-] | |
The benchmark game is wholly gamed, but the proof is in the pudding. I know people using Anthropic, OpenAI, and Gemini. Chinese models locally. But who uses Grok for anything but porn? Whatever the benchmarks might say, Grok is just trash in practice. They spent too much time teaching it to be edgy and not enough time teaching it to code. | ||