Remix.run Logo
FergusArgyll 5 days ago

The biggest issue I have with ARC-AGI is it's a visual problem. LLMs (even the newfangled multi-modal ones) are still far worse at vision than at purely text based problems. I don't think it's possible to build a test of purely text-based questions that would be easy for humans and hard for SOTA models. Yes, there's a few gotchas you can throw at them but not 500.