Remix.run Logo
adastra22 4 days ago

> for a short while it was a trend in the scientific literature to have LLMs evaluate output of other LLMs? Who knows how correct that was.

Highly reliable. So much so that is basically how modern LLMs work internally. Also speaking from personal experience in the projects I work on, it is the chief way to counteract hallucination, poisoned context windows, and scaling beyond the interaction limit.

LLMs evaluating LLM output works surprisingly well.