Remix.run Logo
motoboi 12 hours ago

Things like this make me sad because they make obvious that most people don’t understand a bit about how LLM work.

The “answer before reasoning” is a good evidence for it. It misses the most fundamental concept of tranaformers: the are autoregressive.

Also, the reinforcement learning is what make the model behave like what you are trying to avoid. So the model output is actually what performs best in the kind of software engineering task you are trying to achieve. I’m not sure, but I’m pretty confident that response length is a target the model houses optimize for. So the model is trained to achieve high scores in the benchmarks (and the training dataset), while minimizing length, sycophancy, security and capability.

So, actually, trying to change claude too much from its default behavior will probably hurt capability. Change it too much and you start veering in the dreaded “out of distribution” territory and soon discover why top researcher talk so much about not-AGI-yet.

bitexploder 11 hours ago | parent | next [-]

Forcing short responses will hurt reasoning and chain of thought. There are some potential benefits but forcing response length and when it answers things ironically increases odds of hallucinations if it prioritizes getting the answer out. If it needed more tokens to reason with and validate the response further. It is generally trained to use multiple lines to reason with. It uses english as its sole thinking and reasoning system.

For complex tasks this is not a useful prompt.

nearbuy 11 hours ago | parent | prev | next [-]

> Answer is always line 1. Reasoning comes after, never before.

This doesn't stop it from reasoning before answering. This only affects the user-facing output, not the reasoning tokens. It has already reasoned by the time it shows the answer, and it just shows the answer above any explanation.

motoboi 10 hours ago | parent [-]

The output is part of context. The model reason but also output tokens. Force it to respond in an unfamiliar format and the next token will veer more and more from the training distribution, rendering the model less smart/useful.

miguel_martin 11 hours ago | parent | prev | next [-]

>The “answer before reasoning” is a good evidence for it. It misses the most fundamental concept of tranaformers: the are autoregressive.

I don't think it's fair to assume the author doesn't understand how transformers work. Their intention with this instruction appears to aggressively reduce output token cost.

i.e. I read this instruction as a hack to emulate the Qwen model series's /nothink token instruction

If you're goal is quality outputs, then it is likely too extreme, but there are otherwise useful instructions in this repo to (quantifiably) reduce verbosity.

motoboi 10 hours ago | parent [-]

If they want to reduce token cost, just use a smaller model instead of dumbing down a more expensive.

krackers 11 hours ago | parent | prev | next [-]

Don't most providers already provide API control over the COT length? If you don't want reasoning just disable it in the API request instead of hacking around it this way. (Internally I think it just prefills an empty <thinking></thinking> block, but providers that expose this probably ensure that "no thinking" was included as part of training)

Skidaddle 11 hours ago | parent | prev [-]

To me it’s as simple as “who knows best how to harness the premier LLM – Anthropic, the lab that created it, or this random person?”

That’s why I’m only interested in first party tools over things like OpenCode right now.

andrewmcwatters 10 hours ago | parent [-]

[dead]