| ▲ | paxys 3 hours ago | |
Faster tokens = more reasoning loops, so it can actually make the models smarter as well. | ||
| ▲ | girvo 19 minutes ago | parent [-] | |
Yeah! So at a much smaller scale, being able to boost Step 3.7 Flash up to 40tk/s on my Spark-alike with proper triple head MTP was the thing that made it superior to Qwen 3.6 27B in wall clock time despite Step reasoning more A lot of the open Chinese models get their results through huge reasoning loops. Being able to boost decode perf is what will make them worth it, and I’m sure OpenAI and Anthropic could do similar (if they aren’t already) | ||