Remix.run Logo
jagged-chisel 3 hours ago

> Reproducible would be great

Wouldn’t it be great? I’m still waiting for reproducibility from LLMs.

bko 3 hours ago | parent [-]

Can you reproduce irreproducibility?

Give me a question which the LLM answers vastly differently on runs.

I keep hearing how it's dumb and wrong but no one ever shares the chat or prompt

uxhacker 2 hours ago | parent [-]

Try this with ChatGPT or GROK or Claude

How many days of the week contain the letter d?

The answer I get with ChatGPT, and Grok is 3 and 6 with Claude.

jagged-chisel an hour ago | parent [-]

I just used ChatGPT only, twice. Web interface in a Firefox private window, and in a Chrome incognito window. I asked them both the identical question "How many names of the days of the week contain the letter D?"

In Firefox I got 6. In Chrome I got 7. LLMs are not even self-consistent.

I have the screenshots if anyone cares.