Remix.run Logo
kalcode 5 hours ago

I've tried these with Claude various times and never get the wrong answer. I don't know why, but I am leaning they have stuff like "memory" turned on and possibly reusing sessions for everything? Only thing I think explains it to me.

If your always messing with the AI it might be making memories and expectations are being set. Or its the randomness. But I turned memories off, I don't like cross chats infecting my conversations context and I at worse it suggested "walk over and see if it is busy, then grab the car when line isn't busy".

jorvi 5 hours ago | parent [-]

Even Gemini with no memory does hilarious things. Like, if you ask it how heavy the average man is, you usually get the right answer but occasionally you get a table that says:

- 20-29: 190 pounds

- 30-39: 375 pounds

- 40-49: 750 pounds

- 50-59: 4900 pounds

Yet somehow people believe LLMs are on the cusp of replacing mathematicians, traders, lawyers and what not. At least for code you can write tests, but even then, how are you gonna trust something that can casually make such obvious mistakes?

nickjj 3 hours ago | parent | next [-]

Yeah, ChatGPT's paid version is wildly inaccurate on very important and very basic things. I never got onboard with AI to begin with but nowadays I don't even load it unless I'm really stuck on something programming related.

dyauspitr 5 hours ago | parent | prev [-]

So what? That might happen one out of 100 times. Even if it’s 1 in 10 who cares? Math is verifiable. You’ve just saved yourself weeks or months of work.

icedchai 4 hours ago | parent [-]

You don't think these errors compound? Generated code has 100's of little decisions. Yes, it "usually" works.

russfink an hour ago | parent | next [-]

LLM’s: sometimes wrong but never in doubt.

dyauspitr 4 hours ago | parent | prev [-]

Not in my experience. With a proper TDD framework it does better than most programmers at a company who anecdotally have a bug every 2-3 tasks.

tranceylc 12 minutes ago | parent [-]

The kind of mistakes it makes are usually strange and inhuman though. Like getting hard parts correct while also getting something fundamental about the same problem wrong. And not in the “easy to miss or type wrong” way.

I wish I had an example for you saved, but happens to me pretty frequently. Not only that but it also usually does testing incorrectly at a fundamental level, or builds tests around incorrect assumptions.