The benchmarks on all these models are meaningless

Why and what would a good benchmark look like?

	▲	moffkalast 9 hours ago \| parent [-]
		30 people trying out all models on the list for their use case for a week and then checking what they're still using a month after.