| ▲ | XenophileJKO 8 hours ago | ||||||||||||||||||||||||||||||||||||||||||||||
Don't or can't. My assumption is the model no longer actually thinks in tokens, but in internal tensors. This is advantageous because it doesn't have to collapse the decision and can simultaneously propogate many concepts per context position. | |||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | ainch 7 hours ago | parent | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||
I would expect to see a significant wall clock improvement if that was the case - Meta's Coconut paper was ~3x faster than tokenspace chain-of-thought because latents contain a lot more information than individual tokens. Separately, I think Anthropic are probably the least likely of the big 3 to release a model that uses latent-space reasoning, because it's a clear step down in the ability to audit CoT. There has even been some discussion that they accidentally "exposed" the Mythos CoT to RL [0] - I don't see how you would apply a reward function to latent space reasoning tokens. [0]: https://www.lesswrong.com/posts/K8FxfK9GmJfiAhgcT/anthropic-... | |||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | haellsigh 8 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||
If that's true, then we're following the timeline of https://ai-2027.com/ | |||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | JoshuaDavid 6 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||
Don't. The first 500 or so tokens are raw thinking output, then the summarizer kicks in for longer thinking traces. Sometimes longer thinking traces leak through, or the summarizer model (i.e. Claude Haiku) refuses to summarize them and includes a direct quote of the passage which it won't summarize. Summarizer prompt can be viewed [here](https://xcancel.com/lilyofashwood/status/2027812323910353105...), among other places. | |||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | WhitneyLand 7 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||
No, there is research in that direction and it shows some promise but that’s not what’s happening here. | |||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | 7 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||
| [deleted] | |||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | alex7o 8 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||
Most likely, would be cool yes see a open source Nivel use diffusion for thinking. | |||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | motoboi 7 hours ago | parent | prev [-] | ||||||||||||||||||||||||||||||||||||||||||||||
Don't. thinking right now is just text. Chain of though, but just regular tokens and text being output by the model. | |||||||||||||||||||||||||||||||||||||||||||||||