| ▲ | proxysna 4 hours ago | |||||||
You need to set sampling parameters for the llm. Had the same issue with Qwen3.5 when i first started. You can grab them off the model card page usually. From Qwen3.6 page: Thinking mode for general tasks: temperature=1.0, top_p=0.95, top_k=20, min_p=0.0, presence_penalty=0.0, repetition_penalty=1.0 Thinking mode for precise coding tasks (e.g. WebDev): temperature=0.6, top_p=0.95, top_k=20, min_p=0.0, presence_penalty=0.0, repetition_penalty=1.0 Instruct (or non-thinking) mode: temperature=0.7, top_p=0.80, top_k=20, min_p=0.0, presence_penalty=1.5, repetition_penalty=1.0 | ||||||||
| ▲ | deanc 4 hours ago | parent | next [-] | |||||||
Yes, have tried all of these (as per the docs). Have you actually tried these? Because I have tried all 3 configurations with agentic coding that you mentioned and have the same result - loops. | ||||||||
| ||||||||
| ▲ | Der_Einzige 2 hours ago | parent | prev [-] | |||||||
min_p author here. min_p is strictly better than top_p and top_k. The big labs don't know shit about sampling, and give absolutely nuts recommendations like this. set min_p to like 0.3 and ignore top_p and top_k and you'll be fine. There's better samplers now like top N sigma, top-h, P-less decoding, etc, but they're often not available in your LLM inference engine (i.e. vLLM) | ||||||||