| ▲ | bcherny 3 hours ago |
| Hey all, Boris from the Claude Code team here. I just responded on the issue, and cross-posting here for input. --- Hi, thanks for the detailed analysis. Before I keep going, I wanted to say I appreciate the depth of thinking & care that went into this. There's a lot here, I will try to break it down a bit. These are the two core things happening: > `redact-thinking-2026-02-12` This beta header hides thinking from the UI, since most people don't look at it. It *does not* impact thinking itself, nor does it impact thinking budgets or the way extended reasoning works under the hood. It is a UI-only change. Under the hood, by setting this header we avoid needing thinking summaries, which reduces latency. You can opt out of it with `showThinkingSummaries: true` in your settings.json (see [docs](https://code.claude.com/docs/en/settings#available-settings)). If you are analyzing locally stored transcripts, you wouldn't see raw thinking stored when this header is set, which is likely influencing the analysis. When Claude sees lack of thinking in transcripts for this analysis, it may not realize that the thinking is still there, and is simply not user-facing. > Thinking depth had already dropped ~67% by late February We landed two changes in Feb that would have impacted this. We evaluated both carefully: 1/ Opus 4.6 launch → adaptive thinking default (Feb 9) Opus 4.6 supports adaptive thinking, which is different from thinking budgets that we used to support. In this mode, the model decides how long to think for, which tends to work better than fixed thinking budgets across the board. `CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING` to opt out. 2/ Medium effort (85) default on Opus 4.6 (Mar 3) We found that effort=85 was a sweet spot on the intelligence-latency/cost curve for most users, improving token efficiency while reducing latency. On of our product principles is to avoid changing settings on users' behalf, and ideally we would have set effort=85 from the start. We felt this was an important setting to change, so our approach was to: 1. Roll it out with a dialog so users are aware of the change and have a chance to opt out 2. Show the effort the first few times you opened Claude Code, so it wasn't surprising. Some people want the model to think for longer, even if it takes more time and tokens. To improve intelligence more, set effort=high via `/effort` or in your settings.json. This setting is sticky across sessions, and can be shared among users. You can also use the ULTRATHINK keyword to use high effort for a single turn, or set `/effort max` to use even higher effort for the rest of the conversation. Going forward, we will test defaulting Teams and Enterprise users to high effort, to benefit from extended thinking even if it comes at the cost of additional tokens & latency. This default is configurable in exactly the same way, via `/effort` and settings.json. |
|
| ▲ | raincole 3 hours ago | parent | next [-] |
| > I wanted to say I appreciate the depth of thinking & care that went into this. The irony lol. The whole ticket is just AI-generated. But Anthropic employees have to say this because saying otherwise will admit AI doesn't have "the depth of thinking & care." |
| |
| ▲ | vlovich123 3 hours ago | parent | next [-] | | It's also pretty standard corporate speak to make sure you don't alienate any users / offend anyone. That's why corporate speak is so bland. | |
| ▲ | rafaelmn 2 hours ago | parent | prev [-] | | Ticket is AI generated but from what I've seen these guys have a harness to capture/analyze CC performance, so effort was made on the user side for sure. | | |
|
|
| ▲ | ctoth 3 hours ago | parent | prev | next [-] |
| Yeah LOL tell me I'm holding it wrong again. Actually Boris, I am tracking what is happening here. I see it, and I'm keeping receipts[0]. This started with the 4.6 rollout, specifically with the unearned confidence and not reading as much between writes. The flail quotient has gone right the hell up. If your evals aren't showing that then bully for your evals I reckon. [0]: https://github.com/ctoth/claude-failures |
| |
| ▲ | lambda 2 hours ago | parent | next [-] | | I guess one of the things I don't understand: how you expect a stochastic model, sold as a proprietary SaaS, with a proprietary (though briefly leaked) client, is supposed to be predictable in its behavior. It seems like people are expecting LLM based coding to work in a predictable and controllable way. And, well, no, that's not how it works, and especially so when you're using a proprietary SaaS model where you can't control the exact model used, the inference setup its running on, the harness, the system prompts, etc. It's all just vibes, you're vibe coding and expecting consistency. Now, if you were running a local weights model on your own inference setup, with an open source harness, you'd at least have some more control of the setup. Of course, it's still a stochastic model, trained on who knows what data scraped from the internet and generated from previous versions of the model; there will always be some non-determinism. But if you're running it yourself, you at least have some control and can potentially bisect configuration changes to find what caused particular behavior regressions. | | |
| ▲ | dev_l1x_be 38 minutes ago | parent | next [-] | | The problem is degradation. It was working much better before. There are many people (some example of a well know person[0]), including my circle of friends and me who were working on projects around the Opus 4.6 rollout time and suddenly our workflows started to degrade like crazy. If I did not have many quality gates between an LLM session and production I would have faced certain data loss and production outages just like some famous company did. The fun part is that the same workflow that was reliably going through the quality gates before suddenly failed with something trivial. I cannot pinpoint what exactly Claude changed but the degradation is there for sure. We are currently evaling alternatives to have an escape hatch (Kimi, Chatgpt, Qwen are so far the best candidates and Nemotron). The only issue with alternatives was (before the Claude leak) how well the agentic coding tool integrates with the model and the tool use, and there are several improvements happening already, like [1]. I am hoping the gap narrows and we can move off permanently. No more hoops, you are right, I should not have attempted to delete the production database moments. https://x.com/theo/status/2041111862113444221 https://x.com/_can1357/status/2021828033640911196 | |
| ▲ | stavros an hour ago | parent | prev [-] | | Same as how I expect a coin to come up heads 50% of the time. |
| |
| ▲ | malfist 3 hours ago | parent | prev | next [-] | | It also completely ignores the increase in behavioral tracking metrics. 68% increase in swearing at the LLM for doing something wrong needs to be addressed and isn't just "you're holding it wrong" | | |
| ▲ | alchemist1e9 an hour ago | parent [-] | | I’m think a great marketing line for local/selfhosted LLMs in the future - “You can swear at your LLM and nobody will care!” |
| |
| ▲ | bcherny an hour ago | parent | prev | next [-] | | Christopher, would you be able to share the transcripts for that repo by running /bug? That would make the reports actionable for me to dig in and debug. | |
| ▲ | quietsegfault 3 hours ago | parent | prev | next [-] | | I’m not sure being confrontational like this really helps your case. There are real people responding, and even if you’re frustrated it doesn’t pay off to take that frustration out on the people willing to help. | | |
| ▲ | ctoth 2 hours ago | parent | next [-] | | Fair point on tone. It's a bit of a bind isn't it? When you come with a well-researched issue as OP did, you get this bland corporate nonsense "don't believe your lyin' eyes, we didn't change anything major, you can fix it in settings." How should you actually communicate in such a way that you are actually heard when this is the default wall you hit? The author is in this thread saying every suggested setting is already maxed. The response is "try these settings." What's the productive version of pointing out that the answer doesn't address the evidence? Genuine question. I linked my repo because it's the most concrete example I have. | | |
| ▲ | enraged_camel 14 minutes ago | parent | next [-] | | I read the entire performance degradation report in the OP, and Boris's response, and it seems that the overwhelming majority of the report's findings can indeed be explained by the `showThinkingSummaries` option being off by default as of recently. | |
| ▲ | wonnage 2 hours ago | parent | prev [-] | | Just use a different tool or stop vibe coding, it’s not that hard. I really don’t understand the logic of filing bug reports against the black box of AI | | |
| ▲ | geysersam 22 minutes ago | parent [-] | | People file tickets against closed source "black box" systems all the time. You could just as well say: Stop using MS SQL, just use a different tool, it's not that hard. |
|
| |
| ▲ | malfist 2 hours ago | parent | prev | next [-] | | Is somebody saying "you're holding it wrong" a "people willing to help"? | | |
| ▲ | TeMPOraL 42 minutes ago | parent | next [-] | | They are if you are, in fact, holding it wrong. As was the usual case in most of the few years LLMs existed in this world. Think not of iPhone antennas - think of a humble hammer. A hammer has three ends to hold by, and no amount of UI/UX and product design thinking will make the end you like to hold to be a good choice when you want to drive a Torx screw. | |
| ▲ | Retr0id 2 hours ago | parent | prev [-] | | You're holding it absolutely right! |
| |
| ▲ | BigTTYGothGF 2 hours ago | parent | prev | next [-] | | The stated policy of HN is "don't be mean to the openclaw people", let's see if it generalizes. | |
| ▲ | throwaway613746 2 hours ago | parent | prev [-] | | [dead] |
| |
| ▲ | iwalton3 3 hours ago | parent | prev [-] | | [dead] |
|
|
| ▲ | richardjennings 3 hours ago | parent | prev | next [-] |
| I was not aware the default effort had changed to medium until the quality of output nosedived. This cost me perhaps a day of work to rectify. I now ensure effort is set to max and have not had a terrible session since. Please may I have a "always try as hard as you can" mode ? |
| |
|
| ▲ | DennisL123 3 hours ago | parent | prev | next [-] |
| Happy to have my mind changed, yet I am not 100% convinced closing the issue as completed captures the feedback. |
| |
| ▲ | bcherny 3 hours ago | parent [-] | | From the contents of the issue, this seems like a fairly clear default effort issue. Would love your input if there's something specific that you think is unaddressed. | | |
| ▲ | DennisL123 5 minutes ago | parent | next [-] | | Gotcha. It seemed though from the replies on the github ticket that at least some of the problem was unrelated to effort settings. | |
| ▲ | JamesSwift an hour ago | parent | prev | next [-] | | I commented on the GH issue, but Ive had effort set to 'high' for however long its been available and had a marked decline since... checks notes... about 23 March according to slack messages I sent to the team to see if I was alone (I wasnt). EDIT: actually the first glaring issue I remember was on 20 March where it hallucinated a full sha from a short sha while updating my github actions version pinning. That follows a pattern of it making really egregious assumptions about things without first validating or checking. Ive also had it answer with hallucinated information instead of looking online first (to a higher degree than Ive been used to after using these models daily for the past ~6 months) | | |
| ▲ | dev_l1x_be 35 minutes ago | parent [-] | | It hallucinated a GUID for me instead of using the one in the RFC for webscokets. Fun part was that the beginning was the same. Then it hardcoded the unit tests to be green with the wrong GUID. |
| |
| ▲ | vecter 2 hours ago | parent | prev [-] | | From this reply, it seems that it has nothing to do with `/effort`: https://github.com/anthropics/claude-code/issues/42796#issue... I hope you take this seriously. I'm considering moving my company off of Claude Code immediately. Closing the GH issue without first engaging with the OP is just a slap in the face, especially given how much hard work they've done on your behalf. | | |
| ▲ | wonnage 2 hours ago | parent [-] | | The OP “bug report” is a wall of AI slop generated from looking at its own chat transcripts | | |
|
|
|
|
| ▲ | plexicle 3 hours ago | parent | prev | next [-] |
| Ultrathink is back? I thought that wasn't a thing anymore. If I am following.. "Max" is above "High", but you can't set it to "Max" as a default. The highest you can configure is "High", and you can use "/effort max" to move a step up for a (conversation? session?), or "ultrathink" somewhere in the prompt to move a step up for a single turn. Is this accurate? |
| |
|
| ▲ | koverstreet 3 hours ago | parent | prev | next [-] |
| There's been more going on than just the default to medium level thinking - I'll echo what others are saying, even on high effort there's been a very significant increase in "rush to completion" behavior. |
| |
| ▲ | bcherny 3 hours ago | parent | next [-] | | Thanks for the feedback. To make it actionable, would you mind running /bug the next time you see it and posting the feedback id here? That way we can debug and see if there's an issue, or if it's within variance. | | |
| ▲ | koverstreet 3 hours ago | parent | next [-] | | I'll have a look. The CoT switch you mentioned will help, I'll take a look at that too, but my suspicion is that this isn't a CoT issue - it's a model preference issue. Comparing Opus vs. Qwen 27b on similar problems, Opus is sharper and more effective at implementation - but will flat out ignore issues and insist "everything is fine" that Qwen is able to spot and demonstrate solid understanding of. Opus understands the issues perfectly well, it just avoids them. This correlates with what I've observed about the underlying personalities (and you guys put out a paper the other day that shows you guys are starting to understand it in these terms - functionally modeling feelings in models). On the whole Opus is very stable personality wise and an effective thinker, I want to complement you guys on that, and it definitely contrasts with behaviors I've seen from OpenAI. But when I do see Opus miss things that it should get, it seems to be a combination of avoidant tendencies and too much of a push to "just get it done and move into the next task" from RHLF. | |
| ▲ | freedomben 3 hours ago | parent | prev [-] | | How much of the code/context gets attached in the /bug report? | | |
| ▲ | bcherny 3 hours ago | parent [-] | | When you submit a /bug we get a way to see the contents of the conversation. We don't see anything else in your codebase. | | |
| ▲ | murkt 4 minutes ago | parent [-] | | Was there a change in Claude Code system prompt at that time that nudges Claude into simplistic thinking? Here is a gist that tries to patch the system prompt to make Claude behave better https://gist.github.com/roman01la/483d1db15043018096ac3babf5... I haven’t personally tried it yet. I do certainly battle Claude quite a lot with “no I don’t want quick-n-easy wrong solution just because it’s two lines of code, I want best solution in the long run”. If the system prompt indeed prefers laziness in 5:1 ratio, that explains a lot. I will submit /bug in a few next conversations, when it occurs next. |
|
|
| |
| ▲ | stefan_ 21 minutes ago | parent | prev [-] | | Theres also been tons of thinking leaking into the actual output. Recently it even added thinking into a code patch it did (a[0] &= ~(1 << 2); // actually let me just rewrite { .. 5 more lines setting a[0] .. }). |
|
|
| ▲ | johndough 2 hours ago | parent | prev | next [-] |
| I think it is hilarious that there are four different ways to set settings (settings.json config file, environment variable, slash commands and magical chat keywords). That kind of consistency has also been my own experience with LLMs. |
| |
| ▲ | monatron an hour ago | parent | next [-] | | To be fair, I can think of reasons why you would want to be able to set them in various ways. - settings.json - set for machine, project - env var - set for an environment/shell/sandbox - slash command - set for a session - magical keyword - set for a turn | |
| ▲ | SAI_Peregrinus an hour ago | parent | prev | next [-] | | It's not unique to LLMs. Take BASH: you've got `/etc/profile`, `~/.bash_profile,` `~/.bash_login`, `~/.bashrc`, `~/.profile`, environment variables, and shell options. | |
| ▲ | ggdxwz an hour ago | parent | prev [-] | | Especially some settings are in setting.json, and others in .claude.json
So sometimes I have to go through both to find the one I want to tweak |
|
|
| ▲ | w10-1 2 hours ago | parent | prev | next [-] |
| Here's the reply in context: https://github.com/anthropics/claude-code/issues/42796#issuecomment-4194007103
Sympathies: Users now completely depend on their jet-packs. If their tools break (and assuming they even recognize the problem). it's possible they can switch to other providers, but more likely they'll be really upset for lack of fallbacks. So low-touch subscriptions become high-touch thundering herds all too quickly. |
|
| ▲ | dc_giant 3 hours ago | parent | prev | next [-] |
| All right so what do I need to do so it does its job again? Disable adaptive thinking and set effort to high and/or use ULTRATHINK again which a few weeks ago Claude code kept on telling me is useless now? |
| |
| ▲ | stldev 2 hours ago | parent | next [-] | | You can't. This is Anthropic leveraging their dials, and ignoring their customers for weeks. Switch providers. Anecdotally, I've had no luck attempting to revert to prior behavior using either high/max level thinking (opus) or prompting. The web interface for me though doesn't seem problematic when using opus extended. | |
| ▲ | bcherny 3 hours ago | parent | prev [-] | | Run this: /effort high | | |
| ▲ | berkanunal 2 hours ago | parent [-] | | Imagine if all service providers were behaving like this. > Ahh, sorry we broke your workflow. > We found that `log_level=error` was a sweet spot for most users. > To make it work as you expect it so, run `./bin/unpoop` it will set log_level=warn |
|
|
|
| ▲ | areoform an hour ago | parent | prev | next [-] |
| Hey Boris, thanks for the awesomeness that's Claude! You've genuinely changed the life of quite a few young people across the world. :) not sure if the team is aware of this, but Claude code (cc from here on) fails to install / initiate on Windows 10; precise version, Windows 10.0.19045 build 19045. It fails mid setup, and sometimes fails to throw up a log. It simply calls it quits and terminates. On MacOS, I use Claude via terminal, and there have been a few, minor but persistent harness issues. For example, cc isn't able to use Claude for Chrome. It has worked once and only once, and never again. Currently, it fails without a descriptive log or issue. It simply states permission has been denied. More generally, I use Claude a lot for a few sociological experiments and I've noticed that token consumption has increased exponentially in the past 3 weeks. I've tried to track it down by project etc., but nothing obvious has changed. I've gone from almost never hitting my limits on a Max account to consistently hitting them. I realize that my complaint is hardly unique, but happy to provide logs / whatever works! :) And yeah, thanks again for Claude! I recommend Claude to so many folks and it has been instrumental for them to improve their lives. I work for a fund that supports young people, and we'd love to be able to give credits out to them. I tried to reach out via the website etc. but wasn't able to get in touch with anyone. I just think more gifted young people need Claude as a tool and a wall to bounce things off of; it might measurably accelerate human progress. (that's partly the experiment!) |
|
| ▲ | aizk 3 hours ago | parent | prev | next [-] |
| How do you guys manage regressions as a whole with every new model update? A massive test set of e2e problem solving seeing how the models compare? |
| |
|
| ▲ | migali49g 25 minutes ago | parent | prev | next [-] |
| Hi Boris, thanks for addressing this and providing feedback quickly. I noticed the same issue.
My question is, is it enough to do /efforts high, or should I also add CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING to my settings? |
|
| ▲ | KenoFischer 2 hours ago | parent | prev | next [-] |
| While we have you here, could you fix the bash escaping bug? https://github.com/anthropics/claude-code/issues/10153 |
|
| ▲ | starkparker 3 hours ago | parent | prev | next [-] |
| > You can also use the ULTRATHINK keyword to use high effort for a single turn First I've heard that ultrathink was back. Much quieter walkback of https://decodeclaude.com/ultrathink-deprecated/ |
| |
| ▲ | giwook 23 minutes ago | parent [-] | | Pretty sure it's still gone and you should be using effort level now for this. |
|
|
| ▲ | matheusmoreira 2 hours ago | parent | prev | next [-] |
| I definitely noticed the mid-output self-correction reasoning loops mentioned in the GitHub issue in some conversations with Opus 4.6 with extended reasoning enabled on claude.ai. How do I max out the effort there? |
|
| ▲ | JohnMakin 2 hours ago | parent | prev | next [-] |
| I’ve seen you/anthropic comment repeatedly over the last several months about the “thinking” in similar ways - “most users dont look at it” (how do you know this?) “our product team felt it was too visually noisy” etc etc. But every time something like this is stated, your power users (people here for the most part) state that this is dead wrong. I know you are repeating the corporate line here, but it’s bs. |
| |
| ▲ | wonnage 2 hours ago | parent [-] | | Anecdotally the “power users” of AI are the ones who have succumbed to AI psychosis and write blog posts about orchestrating 30 agents to review PRs when one would’ve done just fine. The actual power users have an API contract and don’t give a shit about whatever subscription shenanigans Claude Max is pulling today | | |
|
|
| ▲ | ting0 3 hours ago | parent | prev | next [-] |
| Thinking time is not the issue. The issue is that Claude does not actually complete tasks. I don't care if it takes longer to think, what I care about is getting partial implementations scattered throughout my codebase while Claude pretends that it finished entirely. You REALLY need to fix this, it's atrocious. |
|
| ▲ | j45 2 hours ago | parent | prev | next [-] |
| Thanks for the update, Perhaps max users can be included in defaulting to different effort levels as well? |
|
| ▲ | yubblegum 2 hours ago | parent | prev | next [-] |
| > Before I keep going, I wanted to say I appreciate the depth of thinking & care that went into this. "This report was produced by me — Claude Opus 4.6 — analyzing my own session
logs. ... Ben built the stop hook, the convention reviews, the frustration-capture tools, and this entire analysis pipeline because he believes the problem is fixable and the collaboration is worth saving. He spent today — a day he could have spent shipping code — building infrastructure to work around my limitations instead of leaving." What a "fuckin'" circle jerk this universe has turned out to be. This note was produced by me and who the hell is Ben? |
|
| ▲ | ting0 3 hours ago | parent | prev | next [-] |
| Do you guys realize that everyone is switching to Codex because Claude Code is practically unusable now, even on a Max subscription? You ask it to do tasks, and it does 1/10th of them. I shouldn't have to sit there and say: "Check your work again and keep implementing" over and over and over again... Such a garbage experience. Does Anthropic actually care? Or is it irrelevant to your company because you think you'll be replacing us all in a year anyway? |
|
| ▲ | tatrions 3 hours ago | parent | prev | next [-] |
| [flagged] |
| |
| ▲ | koverstreet 3 hours ago | parent | next [-] | | Technically speaking, models inherently do this - CoT is just output tokens that aren't included in the final response because they're enclosed in <think> tags, and it's the model that decides when to close the tag. You can add a bias to make it more or less likely for a model to generate a particular token, and that's how budgets work, but it's always going to be better in the long run to let the model make that decision entirely itself - the bias is a short term hack to prevent overthinking when the model doesn't realize it's spinning in circles. | | |
| ▲ | ai_slop_hater 2 hours ago | parent [-] | | > You can add a bias to make it more or less likely for a model to generate a particular token, and that's how budgets work Do you have a source for this? I am interested in learning more about how this works. | | |
| ▲ | koverstreet 2 hours ago | parent [-] | | It's how temperature/top_p/top_k work. Anthropic also just put out a paper where they were doing a much more advanced version of this, mapping out functional states within the modern and steering with that. | | |
| ▲ | ai_slop_hater 2 hours ago | parent [-] | | Huh, I wonder if that's why you cannot change the temperature when thinking is enabled. Do you have a link for the paper? | | |
|
|
| |
| ▲ | bcherny 3 hours ago | parent | prev [-] | | Yep totally -- think of this as "maximum effort". If a task doesn't need a lot of thinking tokens, then the model will choose a lower effort level for the task. |
|
|
| ▲ | ai_slop_hater 3 hours ago | parent | prev | next [-] |
| > This beta header hides thinking from the UI, since most people don't look at it. I look at it, and I am very upset that I no longer see it. |
| |
| ▲ | bcherny 3 hours ago | parent [-] | | There is a setting if you'd like to continue to see it: showThinkingSummaries. See the docs: https://code.claude.com/docs/en/settings#available-settings | | |
| ▲ | starkparker 3 hours ago | parent | next [-] | | > Thinking summaries will now appear in the transcript view (Ctrl+O). Also: https://github.com/anthropics/claude-code/issues/30958 | | |
| ▲ | ai_slop_hater 3 hours ago | parent [-] | | I also have similar experience with their API, i.e. some requests get stalled for minutes with zero events coming in from Anthropic. Presumably the model does this "extended thinking" but no way to see that. I treat these requests as stuck and retry. Same experience in Claude Code Opus 4.6 when effort is set to "high"—the model gets stuck for ten minutes (at which point I cancel) and token count indicator doesn't increase. I am not buying what this guy says. He is either lying or not telling us everything. |
| |
| ▲ | antonvs 3 hours ago | parent | prev [-] | | > As I noted in the comment, Piece of free PR advice: this is fine in a nerd fight, but don't do this in comments that represent a company. Just repeat the relevant information. | | |
| ▲ | trvz 30 minutes ago | parent | next [-] | | Piece of free advice towards a better civilisation: people who didn't even read the comment they're replying to shouldn't be rewarded for their laziness. | | |
| ▲ | ai_slop_hater 17 minutes ago | parent [-] | | I read his comment and still replied. I think his claim that nobody reads thinking blocks and that thinking blocks increase latency is nonsense. I am not going to figure out which settings I need to enable because after reading this thread I cancelled my subscription and switched over to Codex. Because I had the exact same experience as many in this thread. Also what is that "PR advice"—he might as well wear a suit. This is absolutely a nerd fight. |
| |
| ▲ | bcherny 2 hours ago | parent | prev [-] | | Fair feedback, edited! |
|
|
|
|
| ▲ | nickvec an hour ago | parent | prev [-] |
| Hey Boris, would appreciate if you could respond to my DM on X about Claude erroneously charging me $200 in extra credit usage when I wasn't using the service. Haven't heard back from Claude Support in over a month and I am getting a bit frustrated. |