The entire history of RL-trained "reasoning models" from o1 to DeepSeek_R1 is basically just a year old!