I imagine the value of something like this is for business owners to choose which LLMs they can replace their employees with, so it using human IQ tests is relevant.

▲

azernik 4 days ago | parent [-]

The point is that the correlation between doing well on these tasks and doing well on other (directly useful) tasks is well established for humans, but not well established for LLMs.

If the employees' job is taking IQ tests, then this is a great measure for employers. Otherwise, it doesn't measure anything useful.

▲

bbarnett 4 days ago | parent [-]

Otherwise, it doesn't measure anything useful.

Oh it measures a useful metric, absolutely, as aspects of an IQ test validate certain types of cognition. Those types of cognition have been found to map to real-world employment of the same.

If an AI is so incapable of performing admirably on an IQ test for those types of cognition, then one thing we're certainly measuring is that it's incapable of handling that 'class' of cognition if the conditions change in minuscule and tiny ways.

And that's quite important.

For example, if the model appears to perform specific work tasks well, related to a class of cognition, then cannot do the same category of cognitive tasks outside of that scope, we're measuring lack of adaptability or true cognitive capability.

It's definitely measuring something. Such as, will the model go sideways with small deviations on task or input? That's a nice start.

	▲	azernik 2 days ago \| parent [-]
		"Those types of cognition have been found to map to real-world employment of the same." ...in humans. That correlation has not been established for LLMs.