Remix.run Logo
layoric a day ago

There are several papers pointing towards 'thinking' output is meaningless to the final output, and using dots, or pause tokens enabling the same additional rounds of throughput result in similar improvements.

So in a lot of regards the 'thinking' is mostly marketing.

- "Think before you speak: Training Language Models With Pause Tokens" - https://arxiv.org/abs/2310.02226

- "Let's Think Dot by Dot: Hidden Computation in Transformer Language Models" - https://arxiv.org/abs/2404.15758

- "Do LLMs Really Think Step-by-step In Implicit Reasoning?" - https://arxiv.org/abs/2411.15862

- Video by bycloud as an overview -> https://www.youtube.com/watch?v=Dk36u4NGeSU

Davidzheng a day ago | parent | next [-]

Lots of papers are insane. You can test it on competition math problems with s local AI and replace its thinking process with dots and see the result yourself.

WA 17 hours ago | parent [-]

So what's the result? I don't have a local LLM. Are the "dot papers" insane or the "thinking in actual reasoning tokens" insane?

cayley_graph a day ago | parent | prev [-]

Wow, my first ever video on AI! I'm rather disappointed. That was devoid of meaningful content save for the two minutes where they went over the Anthropic blog post on how LLMs (don't) do addition. Importantly, they didn't remotely approach what those other papers are about, or why thinking tokens aren't important for chain-of-thought. Is all AI content this kind of slop? Sorry, no offense to the above comment, it was just a total waste of 10 minutes that I'm not used to.

So, to anyone more knowledgeable than the proprietor of that channel: can you outline why it's possible to replace thinking tokens with garbage without a decline in output quality?

edit: Section J of the first paper seems to offer some succint explanations.

layoric 20 hours ago | parent [-]

The video is just an entertaining overview, as indicated, I'm not the author of the video, it wasn't meant to be a deep dive. I linked the three related papers directly in there. I don't know how much more you are expecting from a HN comment, but this was a point in the right direction, not the definitive guide on the matter. This is a you problem.

cayley_graph 19 hours ago | parent [-]

An overview of what? It's entertaining to me when I come away understanding something more than I did before. I expected a high level explanation of the papers, or the faintest intuition behind the phenomenon your comment talked about.

If you watched the video, it doesn't actually say anything besides restating variants of "thinking tokens aren't important" in a few different ways, summarizing a distantly related blog post, and entertaining some wild hypotheses about the future of LLMs. It's unclear if the producer has any deeper understanding of the subject; it honestly sounded like some low grade LLM generated fluff. I'm simply not used to that level of lack-of-substance. It wasn't a personal attack against you, as indicated.