AI benchmarks are so strange and confusing for those outside of the field.
These "IQ" results are so different than metrics like GPQA, AIME, SWE Bench, etc.
https://artificialanalysis.ai/leaderboards/models