Remix.run Logo
JamesSwift 6 hours ago

I envy your experience. Its driving me crazy on a near daily basis now.

wombat-man 6 hours ago | parent | next [-]

I'm still getting pretty good code out of it, but I only use it on side projects. Is the issue with their odd limit system?

JamesSwift 6 hours ago | parent [-]

Im on pay-per-use plans so its not the limits thats the issue directly, although the product development process could lead to them trying to fix limit issue and breaking the product as a whole.

The main issue is side effects of effort/thinking it seems. It hallucinates at a much higher rate and skips research in a ton of edge cases even with effort of MAX and disabling adaptive thinking, even on 4.6. Ive said before, but using opus today feels like using sonnet from ~October timeframe. Its not anywhere near what opus 4.5 in January felt like, or even opus 4.6 on release (notably 4.6 on release _really_ over-researched even simple tasks and that behavior is almost entirely gone now even with max effort so they are definitely re-tuning these things on the fly and degrading the experience as a result).

EDIT: I also have a very high suspicion that the way they hydrate thinking is buggy and/or lossy (or maybe unintentionally lossy which leads to bugs). So many behaviors just make no sense at the level I have my setup tuned (I have everything set to "just charge me the most money to hopefully get the best results") and the fact that I havent changed anything while using it daily for months and months on end, but have been getting worse and worse results.

hungryhobbit 5 hours ago | parent | next [-]

Claude has definitely gotten stupider (even on the latest Opus).

I used to be able to give it certain commands, and reliably count on it to do the right thing. Lately I give it identical commands and it just starts doing something idiotic, instead of the correct thing (that it did 50 times prior).

To an earlier poster's point, it's probably the model, not the harness, and I understand Anthropic has to make money someday (and they're not now) ... but I'd rather see a visible doubling of price than a secret halving of the capabilities (which seems to be their current plan).

That approach is enshitification.

wombat-man 5 hours ago | parent | prev [-]

Yeah I have found worse results if I don't leave it on the highest setting. I have gotten by with Pro and a little overage buffer so far. I have found it working pretty well for what I'm using it for but I have really only been using it a couple months now.

dgellow 6 hours ago | parent | prev [-]

But are you saying the harness is driving you insane? Or the model? Because Bun is,only the harness, and that part has been improving over time if you stay on the stable channel

JamesSwift 6 hours ago | parent | next [-]

Its the product overall, and its impossible to say where the issues are but I tend to think not the model since the changes seem to be able to occur overnight. So likely a combo of harness and service-layer.

redsocksfan45 5 hours ago | parent | prev [-]

Which of the two is responsible for it ignoring being in Plan mode and trying to implement shit instead of just writing up a plan?

dgellow 4 hours ago | parent [-]

The model I believe. Also a pet peeve of mine…