| ▲ | stingraycharles 8 hours ago |
| While the caveman stuff is obviously not serious, there is a lot of legit research in this area. Which means yes, you can actually influence this quite a bit. Read the paper “Compressed Chain of Thought” for example, it shows it’s really easy to make significant reductions in reasoning tokens without affecting output quality. There is not too much research into this (about 5 papers in total), but with that it’s possible to reduce output tokens by about 60%. Given that output is an incredibly significant part of the total costs, this is important. https://arxiv.org/abs/2412.13171 |
|
| ▲ | altruios 7 hours ago | parent | next [-] |
| Who would suspect that the companies selling 'tokens' would (unintentionally) train their models to prefer longer answers, reaping a HIGHER ROI (the thing a publicly traded company is legally required to pursue: good thing these are all still private...)... because it's not like private companies want to make money... |
| |
| ▲ | stingraycharles 5 hours ago | parent | next [-] | | I don’t think this is a plausible argument, as they’re generally capacity constrained, and everyone would like shorter (= faster) responses. I’m fairly certain that in a few more releases we’ll have models with shorter CoT chains. Whether they’ll still let us see those is another question, as it seems like Anthropic wants to start hiding their CoT, potentially because it reveals some secret sauce. | |
| ▲ | fancyfredbot 2 hours ago | parent | prev | next [-] | | Try setting up one laundry which charges by the hour and washes clothes really really slowly, and another which washes clothes at normal speed at cost plus some margin similar to your competitors. The one which maximizes ROI will not be the one you rigged to cost more and take longer. | | |
| ▲ | sebastiennight 5 minutes ago | parent [-] | | I don't think the analogy is correct here. Directionally, tokens are not equivalent to "time spent processing your query", but rather a measure of effort/resource expended to process your query. So a more germane analogy would be: What if you set up a laundry which charges you based on the amount of laundry detergent used to clean your clothes? Sounds fair. But then, what if the top engineers at the laundry offered an "auto-dispenser" that uses extremely advanced algorithms to apply just the right optimal amount of detergent for each wash? Sounds like value-added for the customer. ... but now you end up with a system where the laundry management team has strong incentives to influence how liberally the auto-dispenser will "spend" to give you "best results" |
| |
| ▲ | gwern 4 hours ago | parent | prev [-] | | LLM APIs sell on value they deliver to the user, not the sheer number of tokens you can buy per $. The latter is roughly labor-theory-of-value levels of wrong. |
|
|
| ▲ | ACCount37 8 hours ago | parent | prev | next [-] |
| Some labs do it internally because RLVR is very token-expensive. But it degrades CoT readability even more than normal RL pressure does. It isn't free either - by default, models learn to offload some of their internal computation into the "filler" tokens. So reducing raw token count always cuts into reasoning capacity somewhat. Getting closer to "compute optimal" while reducing token use isn't an easy task. |
| |
| ▲ | stingraycharles 7 hours ago | parent [-] | | Yeah the readability suffers, but as long as the actual output (ie the non-CoT part) stays unaffected it’s reasonably fine. I work on a few agentic open source tools and the interesting thing is that once I implemented these things, the overall feedback was a performance improvement rather than performance reduction, as the LLM would spend much less time on generating tokens. I didn’t implement it fully, just a few basic things like “reduce prose while thinking, don’t repeat your thoughts” etc would already yield massive improvements. |
|
|
| ▲ | AdamN 7 hours ago | parent | prev [-] |
| Yeah you could easily imagine stenography like inputs and outputs for rapid iteration loops. It's also true that in social media people already want faster-to-read snippets that drop grammar so the desire for density is already there for human authors/readers. |