Remix.run Logo
wyre 3 hours ago

ARC-AGI is testing raw intelligence, like the raw power of a Formula 1 engine. The rest of the car is the harness.

gchamonlive 2 hours ago | parent [-]

Maybe there is a complex relationship between harness, model and the emergent perceived intelligence we just can't access by isolating the model alone to evaluate "raw intelligence". I don't think it's absurd to imagine a model that by itself wouldn't be that impressive, but would outperform other models given the right harness. It's also not absurd to think of a model that has incredible raw intelligence, but would not scale much with different harnesses. Model performance given different scenarios depend a LOT on dataset and training strategies, so we need to account for these complex relationships, otherwise measuring "raw intelligence" would be the next AI benchmark that is purely for show.