Remix.run Logo
homebrewer 16 hours ago

Unless you've then read through those sources — and not asked the machine to summarize them again — I don't see how that changes anything.

Judging by your tone and several assumptions based on nothing I see that you're fully converted. No reason to keep talking past each other.

CamperBob2 15 hours ago | parent [-]

No, I'm not "fully converted." I reject the notion that you have to join one cult or the other when it comes to this stuff.

I think we've all seen plenty of hallucinated sources, no argument there. Source hallucination wasn't a problem 2-3 years ago simply because LLMs couldn't cite their sources at all. It was a massive problem 1-2 years ago because it happened all the freaking time. It is a much smaller problem today. It still happens too often, especially with the weaker models.

I'm personally pretty annoyed that no local model (at least that I can run on my own hardware) is anywhere near as hallucination-resistant as the major non-free, non-local frontier models.

In my example, no, I didn't bother confirming the Russell sources in detail, other than to check that they (a) existed and (b) weren't completely irrelevant. I had other stuff to do and don't actually care that much. The comment just struck me as weird, and now I'm better informed thanks to Firefox's AI feature. My takeaway wasn't "Russell wanted to nuke the Russians," but rather "Russell's positions on pacifism and aggression were more nuanced than I thought. Remember to look into this further when/if it comes up again." Where's the harm in that?

Can you share what you asked, and what model you were using? I like to collect benchmark questions that show where progress is and is not happening. If your question actually elicited such a crappy response from a leading-edge reasoning model, it sounds like a good one. But if you really did just issue a throwaway prompt to a free/instant model, then trust me, you got a very wrong impression of where the state of the art really is. The free ChatGPT is inexcusably bad. It was still miscounting the r's in "Strawberry" as late as 5.1.

tsimionescu 10 hours ago | parent [-]

> I'm personally pretty annoyed that no local model (at least that I can run on my own hardware) is anywhere near as hallucination-resistant as the major non-free, non-local frontier models.

And here you get back to my original point: to get good (or at least better) AI, you need complex and huge models, that can't realistically run locally.