| ▲ | miguel_martin 11 hours ago | |
>The “answer before reasoning” is a good evidence for it. It misses the most fundamental concept of tranaformers: the are autoregressive. I don't think it's fair to assume the author doesn't understand how transformers work. Their intention with this instruction appears to aggressively reduce output token cost. i.e. I read this instruction as a hack to emulate the Qwen model series's /nothink token instruction If you're goal is quality outputs, then it is likely too extreme, but there are otherwise useful instructions in this repo to (quantifiably) reduce verbosity. | ||
| ▲ | motoboi 10 hours ago | parent [-] | |
If they want to reduce token cost, just use a smaller model instead of dumbing down a more expensive. | ||