| ▲ | WhitneyLand 5 hours ago | |||||||
>>benchmarks are meaningless No they’re not. Maybe you mean to say they don’t tell the whole story or have their limitations, which has always been the case. >>my fairly basic python benchmark I suspect your definition of “basic” may not be consensus. Gpt-5 thinking is a strong model for basic coding and it’d be interesting to see a simple python task it reliably fails at. | ||||||||
| ▲ | 4 hours ago | parent | next [-] | |||||||
| [deleted] | ||||||||
| ▲ | NaomiLehman 4 hours ago | parent | prev [-] | |||||||
they are not meaningless, but when you work a lot with LLMs and know them VERY well, then a few varied, complex prompts tell you all you need to know about things like EQ, sycophancy, and creative writing. I like to compare them using chathub using the same prompts Gemini still calls me "the architect" in half of the prompts. It's very cringe. | ||||||||
| ||||||||