▲ | vidarh 6 days ago | ||||||||||||||||||||||||||||||||||
The reason for making that observation is that we don't have any other comparable tools and it is more reasonable to benchmark LLMs against what humans are capable of, because whether or not they are good approximations, we are trying to model human abilities. One day maybe we exceed human abilities, but it's unreasonable to expect early attempts - and they are still early attempts - to do things we don't know how to beat other than by putting all kinds of complex process on top of very flawed human thinking. | |||||||||||||||||||||||||||||||||||
▲ | ModernMech 6 days ago | parent [-] | ||||||||||||||||||||||||||||||||||
We do have comparable tools to LLMs. There are plenty of human-composed tools that can do what LLMs do, like Mechanical Turk for instance. The human-composed tool that most closely resembles a LLM is the "bureaucracy". An LLM is like a little committee you send a request to, and based on capricious, opaque rules that can change at any time, the committee returns a response that may or may not service your request. You can't know ahead of time if it will, and depending on the time of day, the political environment, or the amount of work before the committee, the delay of servicing and the quality of service may or may not degrade. Quality may range from a quick accurate response, to flat refusal to service without explanation, or outright lies to your face. There's no way to guarantee a good result, and there's no recourse or explanation for why things go wrong or changed. LLMs feel like a customer service agent turned into a computer program, which is probably why it was people's first thought to use LMMs to automate customer service agents. They are a perfect fit there, but I don't want them to be my primary interface to do work. I have enough bureaucracies to deal with as it is. | |||||||||||||||||||||||||||||||||||
|