| ▲ | dkersten 3 hours ago | |||||||||||||||||||||||||||||||
> This is not scientific at all, just vibes, YMMV. This is the problem. I would love to have a product sheet showing what each models strengths an weaknesses are, so that I can have a clear decision tree of "if this kind of work, use model X", or "model Y should be used in ways Z". But they all look the same from the outside and the only way to figure out which might be marginally better at what is to do extensive, time consuming, and perhaps expensive testing. | ||||||||||||||||||||||||||||||||
| ▲ | coldtea an hour ago | parent | next [-] | |||||||||||||||||||||||||||||||
>I would love to have a product sheet showing what each models strengths an weaknesses are, so that I can have a clear decision tree of "if this kind of work, use model X", or "model Y should be used in ways Z". But they all look the same from the outside and the only way to figure out which might be marginally better at what is to do extensive, time consuming, and perhaps expensive testing. Think of it less like a static tool, and more like a human helper, where the same holds. | ||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||
| ▲ | couscouspie 2 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||
That would be ideal, but AI is less like a tool and more like a human in this regard and you don't have character sheets for each of your colleagues, as well. | ||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||
| ▲ | amelius 2 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||
Yes, but benchmarks can be gamed. Maybe we need better reviewers then? | ||||||||||||||||||||||||||||||||
| ▲ | dotancohen 2 hours ago | parent | prev [-] | |||||||||||||||||||||||||||||||
Honestly, the differences between AI models always felt to me like the differences between coworkers or job candidates. They don't all share the same strengths and weaknesses - and they all have both good days and bad days. Realising this made me respect the "I" in "AI" a bit more seriously. | ||||||||||||||||||||||||||||||||