Remix clone Hacker News

new | show | ask | jobs Github

	▲	fastball 3 hours ago
		Not sure I follow. Anthropic included benchmarks where GPT 5.5 outperforms Claude 4.8. Sure maybe that is a strategic error, but that doesn't seems to indicate benchmarks can't be trusted (I personally don't trust them, but not because of this).