| ▲ | sdenton4 a day ago | |
I like the high level idea! (how do we test intelligence in a non functional way?) I'm effect, the different response types are measuring how the models respond to a context-free novel environment. I imagine humans would also respond on a variety of ways to this test, none of which are necessarily incorrect from the perspective of intelligence testing . Many tests of human behavior (eg, n behavioral economics) create some pretense context to avoid boarding the response that is actually being measured. For example, we may invite a participant to a study of color preference, but actually measure how fast they complete the task when the scientist has/hasn't bathed in a week (or whatever). Likewise, for llm intelligence testing, you could create pretext tasks and context, and perhaps measure what the model considered along the way, instead of the actual task outcome. | ||