>if an AI is confidently telling you something wrong it's hard to work with.

But they all do that. It just comes with the territory. Grok will absolutely do the same thing another time you try it.

aembleton 13 minutes ago | parent | next [-]

> Grok will absolutely do the same thing another time you try it.

True; it's just not happened yet. It will at some point though. With the Sunnypilot example it right out told me that it is not possible on that fork which I appreciated. The others all seem to hallucinate some setting.

▲

ToucanLoucan 2 hours ago | parent | prev | next [-]

It is really, really genuinely concerning how many people think there are profound measurable differences between these things.

Like yeah tonally I guess there are. But with regard to references and information? You’re literally just using three different slot machines and claiming one is hot.

I suppose though I shouldn’t be that surprised then since Vegas and every other casino on Earth has been built on duping people in that exact way.

▲

aembleton 17 minutes ago | parent [-]

> You’re literally just using three different slot machines and claiming one is hot.

It's a fair point. I haven't tested many queries across them all and checked their answers, but if I want to ask one of them a question - right now its Grok just because I trust its answers more.

	▲	ToucanLoucan 8 minutes ago \| parent [-]
		It's not a methodology problem, it's a test-ability problem. LLMs are not deterministic. You can ask the same question to the same LLM five times and you'll likely get at least 3 answers. Again. Slot machine.

▲

cyanydeez 2 hours ago | parent | prev [-]

humans make poor scientists. most people have already made a decision before they run any tests.

the smartest among them just make the tests complicated and biased; the less intelligent just cherry pick.

of course, would you really expect anyone to do real rsearch in this economy?