Remix clone Hacker News

new | show | ask | jobs Github

	▲	MattSayar 6 hours ago
		I recognize the sarcasm. The data I can find says it's performing at baseline however? https://marginlab.ai/trackers/claude-code/
	▲	ACCount37 6 hours ago \| parent [-]
		Yeah, that's my point. Humans are not reliable LLM evaluators. "Secret model nerfs" happen in "vibes" far more often than they do in any reality.