Remix.run Logo
bomewish 4 days ago

Has CC become much stupider in recent weeks, or is it me? Any anecdata out there?

_--__--__ 4 days ago | parent | next [-]

People speculate somewhat seriously that Claude (especially given its French name) picked up at some point that you aren't supposed to work as hard in July and August.

sunaookami 4 days ago | parent | next [-]

That one guy on Twitter that posted this wrote it as a joke and everyone took it seriously. It's not true. It works the same for me.

oc1 4 days ago | parent [-]

How do you know? It acts much lazier in the recent summer months for me..

stavros 4 days ago | parent [-]

How have you disproved the hypothesis that it recently got dumber and it just happens to be summer?

AbstractH24 4 days ago | parent [-]

Clearly, it compared performance to last summer

(Just to be clear, I have no idea what on this thread to take seriously and not and who is. I'm joking at least.)

stavros 4 days ago | parent [-]

That won't do it, though, you'd have to observe it being dumber on June 1 and smart again on September 1 for years.

madrox 4 days ago | parent | prev [-]

How long before we hire psychiatrists instead of engineers to debug AI

OrsonSmelles 4 days ago | parent | next [-]

Well, we could start with some ELIZA instances.

lubujackson 4 days ago | parent [-]

I see that you feel we could start with some ELIZA instances. Can you tell me more about that?

taneq 4 days ago | parent | prev | next [-]

Robopsychologists, you say?

nialse 4 days ago | parent | prev [-]

To be frank psychiatrists, being MDs, would likely prescribe medication and I’m not sure how that would help. As a licensed psychologist I have ideas on how to debug AI though.

AbstractH24 4 days ago | parent [-]

Why, we'll just have specialized agents for ingesting Prozac and that'll magically solve everything.

nico 4 days ago | parent | prev | next [-]

I don’t know about stupider, but definitely less reliable/available

A couple days ago I was getting so many api errors/timeouts I decided to upgrade from the $20 to the $100 plan (as I was also regularly hitting rate limits as well)

It seemed to fix the issue immediately. But today, the errors came back for about half an hour

SOLAR_FIELDS 4 days ago | parent | next [-]

It goes down usually around 1400-1500 UTC. Europeans are still awake and once the west coast joins in the fray Anthropic falls over.

Pretty rare to get a 529 outside of that time window in my personal experience, at least during the USA day.

data-ottawa 4 days ago | parent | prev [-]

Their status page for the week is rough. They’re down to 98% uptime.

Hopefully they work out whatever issue is going on.

https://status.anthropic.com/

boesboes 4 days ago | parent | prev | next [-]

Yeah, it has become unusable for me. Maybe it always has been and I am just trying to solve harder problems with it and more critical of the results. But it’s still infinitely better than gemini for me, that can’t do anything useful. It even tried removing the entire security system from my rails app because it couldn’t figure out how to login in the tests.

I did a test with a very detailed prompt, exactly specified what to fix and how. Claude did it, but not very well. Gemini? it got stuck in a loop until i told it to stop, gave it a hint and then it got stuck again and gave up after trying the exact same thing three more times…

And while Claude managed to get through it, it couldn’t get it right even with some help. It took me 15 minutes to write the prompt, 15 minutes of claude implementing it & another 10 trying to get it to do it correctly. It would have taken me about half the time to do it myself i think..

I am giving up on it for a while.

laborcontract 4 days ago | parent | prev | next [-]

Insert something to the tune of: “never read files in slices. Instead, whenever accessing a file, you must read a file in entirety[..]” at the beginning of every conversation or whenever you’re down to burn more credits/get better results.

A great deal of claude stupidity is due to context engineering, specifically due to the fact that it tries its hardest to pick out just the slice of code it needs to fulfill the task.

A lot of the annoying “you’re absolute right!” come from CC incrementally discovering that you have more than 10 lines of code in that file that pertains to your task.

I don’t believe conspiracies about dumbed down models. Its all context pruning.

oc1 4 days ago | parent [-]

so claude code does the same shit like cursor?

illusive4080 4 days ago | parent | prev | next [-]

Not for me. It gets worse when context is nearly full. I like to compact or clear context more often than it does automatically.

yumraj 4 days ago | parent | next [-]

I’ve thought about that but always forget, good to know it helps.

I wish there were a way to persist in-memory context in a file automatically, say on each compact or git commit. Yesterday CC crashed and restarting it and feeding it all the context was a pain since my updated Claude.md file was a couple of days old. It literally went from a Sr Engineer to a Jr post crash.

jswny 3 days ago | parent [-]

You can do that with hooks! Make a small script that triggers on a commit tool use or a compact hook and reads the conversation file (should be available via inputs to the hook) and back it up somewhere

nico 4 days ago | parent | prev [-]

Do you do this via settings or just keep track of it and manually ask it to do it more often?

furyofantares 4 days ago | parent [-]

(Not the person you're responding to, but) It says how close it is to compacting in bottom right, once it's getting close at least (30% left or something?)

Whenever I see that I think about whether I can find a good point to compact or clear. I also just try to clear whenever it makes sense to avoid getting there and try to give smaller tasks that can be cleared after they're done when possible.

Oh, I guess one thing I do is sometimes have it write a file with what was done, if I'm not actually sure if I want to clear or might want to come back to it. I also sometimes do this rather than compact during a large task - document status and clear.

audinobs 4 days ago | parent | prev | next [-]

I think it is like with a gambling game that you get on hot and cold streaks, runs based on chance.

The model feels like it has got stupid when you get on a cold streak after a hot hand.

slantaclaus 4 days ago | parent | prev [-]

I feel like it’s gotten better recently