The reason for making that observation is that we don't have any other comparable tools and it is more reasonable to benchmark LLMs against what humans are capable of, because whether or not they are good approximations, we are trying to model human abilities.

One day maybe we exceed human abilities, but it's unreasonable to expect early attempts - and they are still early attempts - to do things we don't know how to beat other than by putting all kinds of complex process on top of very flawed human thinking.

▲

ModernMech 6 days ago | parent [-]

We do have comparable tools to LLMs. There are plenty of human-composed tools that can do what LLMs do, like Mechanical Turk for instance. The human-composed tool that most closely resembles a LLM is the "bureaucracy".

An LLM is like a little committee you send a request to, and based on capricious, opaque rules that can change at any time, the committee returns a response that may or may not service your request. You can't know ahead of time if it will, and depending on the time of day, the political environment, or the amount of work before the committee, the delay of servicing and the quality of service may or may not degrade. Quality may range from a quick accurate response, to flat refusal to service without explanation, or outright lies to your face. There's no way to guarantee a good result, and there's no recourse or explanation for why things go wrong or changed.

LLMs feel like a customer service agent turned into a computer program, which is probably why it was people's first thought to use LMMs to automate customer service agents. They are a perfect fit there, but I don't want them to be my primary interface to do work. I have enough bureaucracies to deal with as it is.

▲

vidarh 6 days ago | parent [-]

> We do have comparable tools to LLMs. There are plenty of human-composed tools that can do what LLMs do, like Mechanical Turk for instance.

If you are going to treat humans as tools, then sure. In which case measuring LLMs against human ability is exactly the right thing, given that with Mechanical Turk the tasks are carried out by humans - sometimes with the help of LLMs...

It's utterly bizarre to argue over my comparing LLMs to humans when the tools you argue are comparable are humans.

▲

ModernMech 6 days ago | parent [-]

> when the tools you argue are comparable are humans.

No, they are abstractions over humans. A group of people is not a person, they behave differently than people even though they are composed of them. Abstractions are hard to compare but still much easier than people.

▲

vidarh 6 days ago | parent [-]

This is a meaningless difference that does not alter any of what I wrote.

You're just trying to evade dealing with the contradiction in your argument.

	▲	ModernMech 5 days ago \| parent [-]
		It's not a meaningless difference, it's a crucial one. The contradiction only exists if you collapse all the differences between people and abstractions of people -- crucially that the former are people and the latter are abstractions -- and claim they're the same. Which they are not. Anyway, we've gotten far from the point, which is that LLMs are not people and you can't treat them as such.