| ▲ | bko 3 hours ago | |||||||||||||||||||||||||||||||
> Hallucinations and poor guidance are still a regular day-to-day issue that makes it impossible for me to trust agents with anything. I often hear this. Can you give me a question where a major LLM hallucinates or provides poor guidance? Reproducible would be great Just a question to stump it. | ||||||||||||||||||||||||||||||||
| ▲ | atomicnumber3 3 hours ago | parent | next [-] | |||||||||||||||||||||||||||||||
Just today, the LLM based auto-review that my company enabled for all PRs edited my PR description to confidently assert that I had added a new RPC. I had not. I deleted code and nothing else. Nothing was added. The RPC it claimed I added did not exist. This is a common occurrence. | ||||||||||||||||||||||||||||||||
| ▲ | al_borland 2 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||
LLMs are nondeterministic, so it’s impossible to make something 100% reproducible. Even if it has an issue, it might do it in a different way. If it’s well publicized, they’ll patch that very specific example, but the foundational issue is still there (like counting the R’s in strawberry). I still regularly run into the issue where it just makes up API endpoints, CLI commands, or add flags that simply don’t exist. I also regularly ask it things and it gives me a bad answers, so I push back, and it says something to the effect of “you’re right, I didn’t consider that, let me look at that more”… then tells me the exact opposite of the previous response. Or it “thing X has never happened”, and I ask what about <insert example>, and it goes to look it up and says, “oh, thing X actually did happen.” I run into this daily. Multiple times per day. How can I trust a system like this? Are people just blindly accepting what the LLM says as truth? Is that why people think it’s good? | ||||||||||||||||||||||||||||||||
| ▲ | jagged-chisel 3 hours ago | parent | prev [-] | |||||||||||||||||||||||||||||||
> Reproducible would be great Wouldn’t it be great? I’m still waiting for reproducibility from LLMs. | ||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||