| ▲ | robrenaud 3 days ago | |
Low scores on HLE and ARC AGI might be a good sign. They didn't goodhart their models. ARG AGI in particular doesn't mean much, IMO. It's just some weird hard geometry induction. I don't think it correlates well with real world problem solving. AFAICT, claude code is the biggest engineering mind share. An apple software engineer of mine says he sometimes uses $100/day of claude code tokens at work and gets sad, because that's the budget. Also, look at costs and revenue. OpenAI is bleeding way more than Antropic. | ||