Remix.run Logo
versteegen 3 days ago

Interesting. I haven't compared with 4o or GPT4, but I found DeepSeek 2.5 seems to be better than Claude 3.5 Sonnet (new) at Julia. Although I've seen both Claude and DeepSeek make the exact same sequence of errors (when asked about a certain bug and then given the same reply to their identical mistakes) that shows they don't fully understand the syntax for passing keyword arguments to Julia functions... wow. It was not some kind of tricky case or relevant to the bug. Must have same bad training data. Oops, that's diversion. Actually they're both great in general.

hirvi74 a day ago | parent [-]

I can see what you mean by LLMs making the same mistakes. I had that experience with both GPT and Claude, as well.

However, I found that GPT was better able to correct its mistakes while Claude essentially just doubles down and keeps regurgitating permutations of the same mistakes.

I can't tell you how many times I have had Claude spit out something like, "Use the Foobar.ToString() method to convert the value to a string." To which I reply, something like, "Foobar does not have a method 'ToString()'."

Then Claude will say something like, "You are right to point out that Foobar does not have a .ToString method! Try Foobar.ConvertToString()"

At that point, my frustration levels start to rapidly increase. Have you had experiences like that with Claude or DeepSeek? The main difference with GPT is that GPT tends to find me the right answer after a bit of back-and-forth (or at least point me in a better direction).