| ▲ | matheusmoreira 5 hours ago |
| That analysis is pretty brutal. It's very disconcerting that they can sell access to a high quality model then just stealthily degrade it over time, effectively pulling the rug from under their customers. |
|
| ▲ | riskassessment 4 hours ago | parent | next [-] |
| Stealthily degrade the model or stealthily constrain the model with a tighter harness? These coding tools like Claude Code were created to overcome the shortcomings of last year's models. Models have gotten better but the harnesses have not been rebuilt from scratch to reflect improved planning and tool use inherent to newer models. I do wonder how much all the engineering put into these coding tools may actually in some cases degrade coding performance relative to simpler instructions and terminal access. Not to mention that the monthly subscription pricing structure incentivizes building the harness to reduce token use. How much of that token efficiency is to the benefit of the user? Someone needs to be doing research comparing e.g. Claude Code vs generic code assist via API access with some minimal tooling and instructions. |
| |
| ▲ | nrds 4 hours ago | parent | next [-] | | I've been using pi.dev since December. The only significant change to the harness in that time which affects my usage is the availability of parallel tool calls. Yet Claude models have become unusable in the past month for many of the reasons observed here. Conclusion: it's not the harness. I tend to agree about the legacy workarounds being actively harmful though. I tried out Zed agent for a while and I was SHOCKED at how bad its edit tool is compared to the search-and-replace tool in pi. I didn't find a single frontier model capable of using it reliably. By forking, it completely decouples models' thinking from their edits and then erases the evidence from their context. Agents ended up believing that a less capable subagent was making editing mistakes. | | |
| ▲ | copperx 2 hours ago | parent | next [-] | | Are you using Pi with a cloud subscription, or are you using the API? | |
| ▲ | jfim 3 hours ago | parent | prev [-] | | Out of curiosity, what can parallel tool calls do that one can't do with parallel subagents and background processes? |
| |
| ▲ | NooneAtAll3 2 hours ago | parent | prev | next [-] | | I feel like "feature/model freeze" may be justified just call it something like "[month][year]edition" and work on next release users spend effort arriving to narrow peak of performace, but every change keeps moving the peak sideways | |
| ▲ | jmount 4 hours ago | parent | prev | next [-] | | Love your point. Instructions found to be good by trial and error for one LLM may not be good for another LLM. | | |
| ▲ | lelanthran 3 hours ago | parent [-] | | > Love your point. Instructions found to be good by trial and error for one LLM may not be good for another LLM. Well, according to this story, instructions refined by trial and error over months might be good for one LLM on Tuesday, and then be bad for the same LLM on Wednesday. |
| |
| ▲ | robwwilliams 4 hours ago | parent | prev [-] | | Agree: it is Anthropic's aggressive changes to the harnesses and to the hidden base prompt we users do not see. Clearly intended to give long right tail users a haircut. |
|
|
| ▲ | mikepurvis 4 hours ago | parent | prev | next [-] |
| Disconcerting for sure, but from a business point of view you can understand where they're at; afaiui they're still losing money on basically every query and simultaneously under huge pressure to show that they can (a) deliver this product sustainably at (b) a price point that will be affordable to basically everyone (eg, similar market penetration to smartphones). The constraints of (b) limit them from raising the price, so that means meeting (a) by making it worse, and maybe eventually doing a price discrimination play with premium tiers that are faster and smarter for 10x the cost. But anything done now that erodes the market's trust in their delivery makes that eventual premium tier a harder sell. |
| |
| ▲ | willis936 4 hours ago | parent [-] | | They'll never get anyone on board if the product can't be trusted to not suck. And idk about the pricing thing. Right now I waste multiple dollars on a 40 minute response that is useless. Why would I ever use this product? | | |
| ▲ | matheusmoreira 3 hours ago | parent [-] | | Yeah. I've been enjoying programming with Claude so much I started feeling the need to upgrade to Max. Then it turns out even big companies paying API premiums are getting an intentionally degraded and inferior model. I don't want to pay for Opus if I can't trust what it says. |
|
|
|
| ▲ | the__alchemist 4 hours ago | parent | prev | next [-] |
| ChatGPT has been doing the same consistently for years. Model starts out smooth, takes a while, and produces good (relatively) results. Within a few weeks, responses start happening much more quickly, at a poorer quality. |
| |
| ▲ | beering 3 hours ago | parent [-] | | people have been complaining about this since GPT-4 and have never been able to provide any evidence (even though they have all their old conversations in their chat history). I think it’s simply new model shininess turning into raised expectations after some amount of time. | | |
| ▲ | quietsegfault 3 hours ago | parent [-] | | I agree with you. I too complain about this same phenomenon with my colleagues, and we always arrive at the same conclusion: it’s probably us just expecting more and more over time. |
|
|
|
| ▲ | quikoa 22 minutes ago | parent | prev | next [-] |
| Perhaps the subscription part of the business is so heavily subsidized that they have no choice but to reduce the cost. |
|
| ▲ | ambicapter 4 hours ago | parent | prev | next [-] |
| First time interacting with a corporation in America? |
| |
| ▲ | matheusmoreira 4 hours ago | parent [-] | | With an AI corporation, yes. I subscribed during the promotional 2x usage period. Anthropic's reputation as a more ethical alternative to OpenAI factored heavily in that decision. I'm very disappointed. | | |
|
|
| ▲ | nyeah 4 hours ago | parent | prev | next [-] |
| It's disconcerting. But in 2026 it's not very surprising. |
|
| ▲ | redhed 4 hours ago | parent | prev | next [-] |
| It seems likely to me they are moving compute power to the new models they are creating, |
|
| ▲ | 01284a7e 5 hours ago | parent | prev | next [-] |
| Seems like the logical conclusion, no matter what. |
|
| ▲ | SpicyLemonZest 4 hours ago | parent | prev | next [-] |
| I still think it's a live possibility that there's simply a finite latent space of tasks each model is amenable to, and models seem to get worse as we mine them out. (The source link claims this is associated with "the rollout of thinking content
redaction", but also that observable symptoms began before that rollout, so I wouldn't particularly trust its diagnosis even without the LLM psychosis bit at the end.) |
|
| ▲ | tmpz22 4 hours ago | parent | prev | next [-] |
| > effectively pulling the rug from under their customers. This is the whole point of AI. Its a black box that they can completely control. |
| |
| ▲ | matheusmoreira 4 hours ago | parent [-] | | I hope local models advance to the point they can match Opus one day... | | |
| ▲ | zozbot234 3 hours ago | parent | next [-] | | If OP is correct, Opus has regressed to a point where local models are already on par with it. | |
| ▲ | NinjaTrance 3 hours ago | parent | prev | next [-] | | Considering the advances in software and hardware, I would expect that in 2 or 3 years. And I hope we will eventually reach a point where models become "good enough" for certain tasks, and we won't have to replace them every 6 months. (That would be similar to the evolution of other technologies like personal computers and smartphones.) | |
| ▲ | addandsubtract 4 hours ago | parent | prev [-] | | We said this since ChatGPT 3. People will never be content with local models. |
|
|
|
| ▲ | NinjaTrance 4 hours ago | parent | prev | next [-] |
| [dead] |
|
| ▲ | halfcat 4 hours ago | parent | prev [-] |
| If you think that’s brutal, wait until you hear about how fiat currency works |