>If an LLM can solve a complex problem 50% of the time, then that is still very valuable

I'd adjust that statement - If an LLM can solve a complex problem 50% of the time and I can evaluate correctness of the output, then that is still very valuable. I've seen too many people blindly pass on LLM output - for a short while it was a trend in the scientific literature to have LLMs evaluate output of other LLMs? Who knows how correct that was. Luckily that has ended.

▲

danpalmer 4 days ago | parent | next [-]

> I've seen too many people blindly pass on LLM output

I misread this the first time and realised both interpretations are happening. I've seen people copy-paste out of ChatGPT without reading, and I've seen people "pass on" or reject content simply because it has been AI generated.

▲

adastra22 4 days ago | parent | prev | next [-]

> for a short while it was a trend in the scientific literature to have LLMs evaluate output of other LLMs? Who knows how correct that was.

Highly reliable. So much so that is basically how modern LLMs work internally. Also speaking from personal experience in the projects I work on, it is the chief way to counteract hallucination, poisoned context windows, and scaling beyond the interaction limit.

LLMs evaluating LLM output works surprisingly well.

▲

sothatsit 4 days ago | parent | prev | next [-]

True! This is what has me more excited about LLMs producing Lean proofs than written maths proofs. The Lean proofs can be proved to be correct, whereas the maths proofs require experts to verify them and look for mistakes.

That said, I do think there are lots of problems where verification is easier than doing the task itself, especially in computer science. I think it is easier to list tasks that aren't easier to verify than to do from scratch actually. Security is one major one.

▲

hansvm 4 days ago | parent [-]

Even there it's risky. LLMs are good at subtly misstating the problem, so it's relatively easy to make them prove things which look like the thing you wanted but which are mostly unrelated.

	▲	sothatsit 4 days ago \| parent [-]
		Yes, Lean only lets you be confident in the contents of the proof, not how it was formed. But, I still think that's pretty cool and valuable.

▲

empiko 4 days ago | parent | prev [-]

> Who knows how correct that was. Luckily that has ended.

What do you mean it ended? I still see tons of NLP papers with this methodology.