▲ | sigmoid10 4 days ago | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Skepticism is an understatement. There are tons of issues with this paper. Why are they comparing results of their expert model that was trained from scratch on a single task to general purpose reasoning models? It is well established in the literature that you can still beat general purpose LLMs in narrow domain tasks with specially trained, small models. The only comparison that would have made sense is one to vanilla transformers using the same nr of parameters and trained on the same input-output dataset. But the paper shows no such comparison. In fact, I would be surprised if it was significantly better, because such architecture improvements are usually very modest or not applicable in general. And insinuating that this is some significant development to improve general purpose AI by throwing in ARC is just straight up dishonest. I could probably cook up a neural net in pytorch in a few minutes that beats a hand-crafted single task that o3 can't solve in an hour. That doesn't mean that I made any progress towards AGI. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
▲ | bubblyworld 4 days ago | parent [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Have you spent much time with the ARC-1 challenge? Their results on that are extremely compelling, showing results close to the initial competition's SOTA (as of closing anyway) with a tiny model and no hacks like data augmentation, pretraining, etc that all of the winning approaches leaned on heavily. Your criticism makes sense for the maze solving and sudoku sets, of course, but I think it kinda misses the point (there are traditional algos that solve those just fine - it's more about the ability of neural nets to figure them out during training, and known issues with existing recurrent architectures). Assuming this isn't fake news lol. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|