Remix clone Hacker News

new | show | ask | jobs Github

	▲	MrOrelliOReilly 2 days ago
		Yes, I personally feel that the "official" benchmarks are increasingly diverging from the everyday reality of using these models. My theory is that we are reaching a point where all the models are intelligent enough for day-to-day queries, so points like style/personality and proper use of web queries and other capabilities are better differentiators than intelligence alone.
	▲	int_19h 10 hours ago \| parent [-]
		The benchmarks haven't reflected the real utility for a very long time. At best they tell you which models are definitely bad.