Remix.run Logo
anywhichway 2 days ago

One potential issue with that approach is the factors wouldn't stay very constant across generations of AI models.

While a lot of people have used various methods to try to gauge the strength of various AI models, one of my favorites is this time horizon analysis [1] which took coding tasks of various lengths and looked at how long it takes to humans to complete those tasks and compared that to chance that the AI would successfully complete the task. Then they looked at various threshholds to see how long of tasks an AI could generally complete with a certain percent threshold. They found the length of a task that AI is able to complete with a various threshholds is doubling about every 7 months.

The reason I found this to be an interesting approach is both because AI seems to struggling with coding tasks as the problem grows in complexity and also because being able to give it more complex tasks is an important metric both for coding tasks or more generally just asking AIs to act as independent agents. In my experience increasing the complexity of a problem has a much larger performance falloff for AI than in humans where the task would just take longer, so this approach makes a lot of intuitive sense to me.

[1] - https://theaidigest.org/time-horizons