>leading large language models – GPT-5.2, Claude Sonnet 4 and Gemini 3 Flash
Err what? These weren't even leading at the time (except 5.2). It doesn't even mention using chain of thought.