One thing I see noone asking, is this not a case of optimization? Hidden reasoning means they dont need to process the output of all that, it stays internal within the model. Less cost for them -> less cost for us (even if they benefit mroe), compared to streaming all of those reasoning tokens out?

▲

j4k0bfr a day ago | parent [-]

My understanding was that thinking still gets encrypted, shared with clients, and reingested by Anthropic with each new prompt [1]. Which means it would cost more than normal tokens, since it has to be decrypted/encrypted with every transaction.

[1] https://blog.cryptographyengineering.com/2026/05/29/fooling-...

Edit: other comments under this post seem to indicate that thinking tokens are cached on the server side as well? I'm a bit confused.

	▲	cma 20 hours ago \| parent [-]
		I think the reason it's encrypted is so if you continue a session after it is out of cache it can be reingested. And I think all the output is signed or something as well so that you can't modify the agent's response in your submission, which would would open many more model jailbreaks. For local LLMs it's really powerful to be able to modify the model's response to save tokens when it gets something wrong, or at least it was when they were a lot dumber.