Remix.run Logo
jmathai 3 days ago

I’ve been using the API for a few weeks and routinely get 529 overloaded messages. I wasn’t sure if that’s always been the case but it certainly makes it unsuitable for production workloads because it will last hours at a time.

Hopefully they can add the capacity needed because it’s a lot better than GPT-4o for my intended use case.

rmbyrro 3 days ago | parent | next [-]

Sonnet is better than 4o for virtually all use cases.

The only reason I still use OpenAI's API and chatbot service is o1-preview. o1 is like magic. Everything Sonnet and 4o do poorly, o1 solves like a piece of cake. Architecting, bug fixing, planning, refactoring, o1 has never let me know on any 'hard' task.

A nice combo is have o1 guiding Sonnet. I ask o1 to come up with a solution and explanation, then simply feed its response into Sonnet to execute. That running on Aider really feels like futuristic stuff.

gcanko 3 days ago | parent | next [-]

Exactly my experience as well. Like Sonnet can help me in 90% of the cases but there are some specific edge cases where it struggles that o1 can solve in an instant. I kinda hate it because of having to pay for both of them.

andresgottlieb 3 days ago | parent | next [-]

You should check out Librechat. You can connect different models to it and, instead of paying for both subscriptions, just buy credits for each API.

cruffle_duffle 3 days ago | parent | next [-]

> just buy credits for each API

I’ve always considered doing that but do you come out ahead cost wise?

esperent 3 days ago | parent [-]

I've been using Claude 3.5 over API for about 4 months on $100 of credit. I use it fairly extensively, on mobile and my laptop, and I expected to run out of credit ages ago. However, I am careful to keep chats fairly short as it's long chats that eat up the credit.

So I'd say it depends. For my use case it's about even but the API provides better functionality.

joseda-hg 3 days ago | parent | prev [-]

How does the cost compare?

rjh29 3 days ago | parent | prev [-]

I use tabnine, it let's you switch models.

hirvi74 3 days ago | parent | prev [-]

I alluded to this in another comment, but I have 4o to be better than Sonnet in Swift, Obj-C, and Applescript. In my experiences, Claude is worse than useless with those three languages when compared to GPT. Everything else, I'd say the differences haven't been too extreme. Though, o1-preview absolutely smokes both in my experiences too, but it isn't hard for me to hit it's rate limit either.

versteegen 3 days ago | parent | next [-]

Interesting. I haven't compared with 4o or GPT4, but I found DeepSeek 2.5 seems to be better than Claude 3.5 Sonnet (new) at Julia. Although I've seen both Claude and DeepSeek make the exact same sequence of errors (when asked about a certain bug and then given the same reply to their identical mistakes) that shows they don't fully understand the syntax for passing keyword arguments to Julia functions... wow. It was not some kind of tricky case or relevant to the bug. Must have same bad training data. Oops, that's diversion. Actually they're both great in general.

hirvi74 a day ago | parent [-]

I can see what you mean by LLMs making the same mistakes. I had that experience with both GPT and Claude, as well.

However, I found that GPT was better able to correct its mistakes while Claude essentially just doubles down and keeps regurgitating permutations of the same mistakes.

I can't tell you how many times I have had Claude spit out something like, "Use the Foobar.ToString() method to convert the value to a string." To which I reply, something like, "Foobar does not have a method 'ToString()'."

Then Claude will say something like, "You are right to point out that Foobar does not have a .ToString method! Try Foobar.ConvertToString()"

At that point, my frustration levels start to rapidly increase. Have you had experiences like that with Claude or DeepSeek? The main difference with GPT is that GPT tends to find me the right answer after a bit of back-and-forth (or at least point me in a better direction).

rafaelmn 3 days ago | parent | prev [-]

Having used o1 and Claude through Copilot in VSC - Claude is more accurate and faster. A good example is the "fix test" feature is almost always wrong with o1, Claude is 50/50 I'd say - enough to try. Tried on Typescript/node and Python/Django codebases.

None of them are smart enough to figure out integration test failures with edge cases.

AlexAndScripts 3 days ago | parent | prev [-]

Amazon Bedrock supports Claude 3.5, and you can use inference profiles to split it across multiple regions. It's also the same price.

For my use case I use a hybrid of the two, simulating standard rate limits and doing backoff on 529s. It's pretty reliable that way.

Just beware that the European AWS regions have been overloaded for about a month. I had to switch to the American ones.