Remix clone Hacker News

new | show | ask | jobs Github

	▲	Workaccount2 2 days ago
		I have been on a crusade now for about a year to get people to share chats where SOTA LLMs have failed spectacularly to produce coherent, good information. Anything with Heavy hallucinations and outright bad information. So far, all I have gotten is data that is outside the knowledge cutoff (this is by far the most common) and technicality wrong information (Hawsmer House instead of Hosmer House) kind of fails. I thought maybe I hit on something with the recent BBC study about not trusting LLM output, but they used 2nd shelf/old mid-tier models to do their tests. Top LLMs correctly answered their test prompts. I'm still holding out for one of those totally off the rails Google AI overviews hallucinations showing up in a top shelf model.