> AI has beat world champions at chess and Go, surpassed most humans on SAT and bar exams, and reached gold medal level on IOI and IMO. But the world hasn’t changed much, at least judged by economics and GDP.

> I call this the utility problem, and deem it the most important problem for AI.

> Perhaps we will solve the utility problem pretty soon, perhaps not. Either way, the root cause of this problem might be deceptively simple: our evaluation setups are different from real-world setups in many basic ways.

LLMs are reaching the same stage that most exciting technologies reach. They have quickly attracted lots of investor money, but that is going to have to start turning into actual money. Many research papers are being written, but people are going to start wanting to see actual improvements, not just theoretical improvements on benchmarks.

▲

PaulHoule 3 months ago | parent | next [-]

I think of some of the ways LLMs perform better in real life than they do in evals.

For instance I ask AI assistants a lot about what some code is trying to do in applications software where it is a matter of React, CSS and how APIs get used. Frequently this is a matter of pattern matching and doesn't require deep thought and I find LLMs often nail it.

When it comes to "what does some systems oriented code do" now you are looking at halting problem kind of problems or cases where a person will be hypnotized by an almost-bubble-sort to think it's a bubble sort and the LLM is too. You can certainly make code understanding benchmarks aimed at "whiteboard interview" kind of code that are arbitrarily complex, but that doesn't reflect the ability or inability to deal with "what is up with this API?"

▲

animuchan 3 months ago | parent [-]

I think what you're describing is, easy tasks are easy to perform.

Which is, of course, true. Anecdotally, a lot of value I get from Copilot is in simple, mundane tasks.

	▲	PaulHoule 3 months ago \| parent [-]
		I think easy tasks are basically "linear" in that you don't have interactions between components. If you do have interactions between components complexity gets out of control very quickly. Many practical problems for instance are NP-complete or undecidable. Many of them could be attacked by SMT or SAT but often you can solve them using tactics from math.

▲

pjc50 3 months ago | parent | prev | next [-]

See Solow Paradox (article 2018): https://www.technologyreview.com/2018/06/18/104277/the-produ...

▲

stapedium 3 months ago | parent | prev [-]

Current AI is like search. You still have to know the vocabulary and right questions to ask. You also need the ability to differentiate a novel answer from a hallucination. Its not going to replace lawyers or doctors any time soon.