Those stats dont necessarily line up that way. Do you have a link?
Given the way the test was structured it does line up.
https://arxiv.org/abs/2503.23674
Surprisingly good. I wonder how they would have done without the 5 minute limit on conversations (average of 8 messages per convo per the study)