| ▲ | ForceBru 3 hours ago | |
IMO "thinking" here means "computation", like running matrix multiplications. Another view could be: "thinking" means "producing tokens". This doesn't require any proof because it's literally what the models do. As I understand it, the claim is: more tokens = more computation = more "thinking" => answer probably better. | ||