Remix clone Hacker News

new | show | ask | jobs Github

	▲	robrenaud 3 days ago
		Low scores on HLE and ARC AGI might be a good sign. They didn't goodhart their models. ARG AGI in particular doesn't mean much, IMO. It's just some weird hard geometry induction. I don't think it correlates well with real world problem solving. AFAICT, claude code is the biggest engineering mind share. An apple software engineer of mine says he sometimes uses $100/day of claude code tokens at work and gets sad, because that's the budget. Also, look at costs and revenue. OpenAI is bleeding way more than Antropic.