Remix clone Hacker News

new | show | ask | jobs Github

	▲	miguel_martin 11 hours ago
		>The “answer before reasoning” is a good evidence for it. It misses the most fundamental concept of tranaformers: the are autoregressive. I don't think it's fair to assume the author doesn't understand how transformers work. Their intention with this instruction appears to aggressively reduce output token cost. i.e. I read this instruction as a hack to emulate the Qwen model series's /nothink token instruction If you're goal is quality outputs, then it is likely too extreme, but there are otherwise useful instructions in this repo to (quantifiably) reduce verbosity.
	▲	motoboi 10 hours ago \| parent [-]
		If they want to reduce token cost, just use a smaller model instead of dumbing down a more expensive.