▲ | hodgehog11 3 days ago | |||||||||||||||||||||||||||||||||||||
I keep wondering whether people have actually examined how this work draws its conclusions before citing it. This is science at its worst, where you start at an inflammatory conclusion and work backwards. There is nothing particularly novel presented here, especially not in the mathematics; obviously performance will degrade on out-of-distribution tasks (and will do so for humans under the same formulation), but the real question is how out-of-distribution a lot of tasks actually are if they can still be solved with CoT. Yes, if you restrict the dataset, then it will perform poorly. But humans already have a pretty large visual dataset to pull from, so what are we comparing to here? How do tiny language models trained on small amounts of data demonstrate fundamental limitations? I'm eager to see more works showing the limitations of LLM reasoning, both at small and large scale, but this ain't it. Others have already supplied similar critiques, so let's please stop sharing this one around without the grain of salt. | ||||||||||||||||||||||||||||||||||||||
▲ | ipaddr 3 days ago | parent [-] | |||||||||||||||||||||||||||||||||||||
"This is science at its worst, where you start at an inflammatory conclusion and work backwards" Science starts with a guess and you run experiments to test. | ||||||||||||||||||||||||||||||||||||||
|