| ▲ | nyrikki 2 hours ago | |
It is quite likely that the intermediate tokens don’t have ‘semantic import’[0] There are methods like Habitual Reasoning Distillation or Inverted Reasoning Traces [1] that can help. While there are reasons to hide the intermediate tokens from a IP protection stand point, there is also a need to hide more effective and efficient generating that doesn’t fit the R1 claims of an aha moment that has been debunked, but is a consumer expectation. While hidden intermediate tokens do increase the difficulty, it is not a from barrier in itself, especially as they are billed, given information about their length. | ||