▲ | hirvi74 3 days ago | |||||||
I alluded to this in another comment, but I have 4o to be better than Sonnet in Swift, Obj-C, and Applescript. In my experiences, Claude is worse than useless with those three languages when compared to GPT. Everything else, I'd say the differences haven't been too extreme. Though, o1-preview absolutely smokes both in my experiences too, but it isn't hard for me to hit it's rate limit either. | ||||||||
▲ | versteegen 3 days ago | parent | next [-] | |||||||
Interesting. I haven't compared with 4o or GPT4, but I found DeepSeek 2.5 seems to be better than Claude 3.5 Sonnet (new) at Julia. Although I've seen both Claude and DeepSeek make the exact same sequence of errors (when asked about a certain bug and then given the same reply to their identical mistakes) that shows they don't fully understand the syntax for passing keyword arguments to Julia functions... wow. It was not some kind of tricky case or relevant to the bug. Must have same bad training data. Oops, that's diversion. Actually they're both great in general. | ||||||||
| ||||||||
▲ | rafaelmn 3 days ago | parent | prev [-] | |||||||
Having used o1 and Claude through Copilot in VSC - Claude is more accurate and faster. A good example is the "fix test" feature is almost always wrong with o1, Claude is 50/50 I'd say - enough to try. Tried on Typescript/node and Python/Django codebases. None of them are smart enough to figure out integration test failures with edge cases. |