I don't quite get it why they can't take another LLM and vet the output of the first with the second one. Surely they would not have the same hallucinations and would be able to detect hallucinations of the earlier LLM. Maybe it would cost too much in terms of tokens?

I don't know but I would expect it to be realtively easy for an LLM to detect "hallucinations".

▲

mindcrime 3 hours ago | parent | next [-]

> I don't quite get it why they can't take another LLM and vet the output of the first with the second one.

Yes, this technique and its variations[1][2] "work" but it's still not 100% perfect. And it's not as widely used it might be because, among other reason:

a. it takes longer to implement

b. it costs more (more tokens spread across multiple llm calls)

c. higher latency (getting an answer takes longer due to multiple llm calls involved)

d. the final answer is probabilistically more likely to be correct, but is still not guaranteed to be error free, so you can never fully escape the need for Human in the Loop.

[1]: https://en.wikipedia.org/wiki/LLM-as-a-Judge

[2]: https://github.com/karpathy/llm-council

▲

s0ulf3re 28 minutes ago | parent | prev | next [-]

I am not exactly sure if this would solve the overall problem. The main one being lack of oversight. The solution to a social issue generally isn’t to throw more technology at it.

	▲	s0ulf3re 18 minutes ago \| parent [-]
		IBM once said “a computer can never be held accountable. Therefore a computer must never make a management decision”

▲

operatingthetan 3 hours ago | parent | prev | next [-]

>I don't quite get it why they can't take another LLM and vet the output of the first with the seond one.

I think this may be part of the problem. The actual humans creating the report don't have the expertise to know which one to trust. At least that was what consulting was like in my experience at a similar firm.

▲

TZubiri 3 hours ago | parent | prev | next [-]

Because they used LLMs to do the work. What you are suggesting is to use the LLMs to create more work, which is counter to the shortcut they were trying to take.

	▲	galaxyLogic 2 hours ago \| parent [-]
		Good point with some irony. Thye don't want to do a better job they want to do an easier job. But a company like E&Y should realize shortcuts like these don't work. And their customers are paying them.

▲

voxl 3 hours ago | parent | prev [-]

[flagged]