▲ | waffletower 5 days ago | |
Transformer models, typically architected for and trained on 1d text streams, are not going to perform well on ARC-AGI. I like that the test corpus exists as I believe it suggests that other model architectures (perhaps co-existing with LLMs in a MoE fashion) are needed to generalize AI performance further. For example, if we constructed a 3d version of ARC-AGI (rather than relying on grids) humans would probably still outperform reasoning LLMs handily. However, expand ARC-AGI to 4d and I think human performance might start to become more comparable to LLM performance. 4d is as alien to us as 2d is to LLMs, in this narrow test corpus. |