> This is not scientific at all, just vibes, YMMV.

This is the problem.

I would love to have a product sheet showing what each models strengths an weaknesses are, so that I can have a clear decision tree of "if this kind of work, use model X", or "model Y should be used in ways Z". But they all look the same from the outside and the only way to figure out which might be marginally better at what is to do extensive, time consuming, and perhaps expensive testing.

▲

coldtea an hour ago | parent | next [-]

>I would love to have a product sheet showing what each models strengths an weaknesses are, so that I can have a clear decision tree of "if this kind of work, use model X", or "model Y should be used in ways Z". But they all look the same from the outside and the only way to figure out which might be marginally better at what is to do extensive, time consuming, and perhaps expensive testing.

Think of it less like a static tool, and more like a human helper, where the same holds.

	▲	cassianoleal 3 minutes ago \| parent \| next [-]
		They are not human. Humans have names, faces, voices, personality, a personal history, family, care for whatever they call their community. With humans it's actually good and worthwhile to create and strengthen connections. With an LLM, that's psychosis.
	▲	ACCount37 an hour ago \| parent \| prev \| next [-]
		One issue with that is that human helpers last longer. LLMs cycle in and out in months, and what held for Your Favorite LLM 6.7 may not hold for Your Favorite LLM 6.9.
	▲	madeofpalk an hour ago \| parent \| prev \| next [-]
		Except, where every different model and version is like a different person where you need to learn their idiosyncrasies of how they work every other month. It's a very very bizarre way to use a computer. Personally, I just don't. I'll use and prompt the LLMs the way that feels natural to me and move on with my life. Maybe I don't always get completely optimal results from them, but im also not spending half my day pleading with the computer to do a task.
	▲	gib444 an hour ago \| parent \| prev \| next [-]
		No, I won't anthropomorphise LLMs.
	▲	dreambuffer 21 minutes ago \| parent \| prev [-]
		Please do not think of LLMs like human helpers, that is a recipe for long term sociopathy.

▲

couscouspie 2 hours ago | parent | prev | next [-]

That would be ideal, but AI is less like a tool and more like a human in this regard and you don't have character sheets for each of your colleagues, as well.

	▲	supergarfield an hour ago \| parent \| next [-]
		If my coworker was part of a clone series of 100 million units, requesting a character sheet would be pretty reasonable
	▲	bluegatty an hour ago \| parent \| prev [-]
		These are $1 Trillion dollar companies that can't produce explicit details on how their products work? It's nonsense.

▲

amelius 2 hours ago | parent | prev | next [-]

Yes, but benchmarks can be gamed.

Maybe we need better reviewers then?

▲

dotancohen 2 hours ago | parent | prev [-]

Honestly, the differences between AI models always felt to me like the differences between coworkers or job candidates. They don't all share the same strengths and weaknesses - and they all have both good days and bad days.

Realising this made me respect the "I" in "AI" a bit more seriously.