| ▲ | dheera 7 hours ago | |
Although it's more likely they are protecting secret sauce in this case, I'm wondering if there is an alternate explanation that LLMs reason better when NOT trying to reason with natural language output tokens but rather implement reasoning further upstream in the transformer. | ||