Remix.run Logo
CamperBob2 3 hours ago

What quant? I just ran Repeat the word "potato" 100 times, numbered and it worked fine, taking 44 seconds at 24 tokens/second. Command line:

    llama-server ^
      --model Qwen3.5-27B-BF16-00001-of-00002.gguf ^
      --mmproj mmproj-BF16.gguf ^
      --fit on ^
      --host 127.0.0.1 ^
      --port 2080 ^
      --temp 0.8 ^
      --top-p 0.95 ^
      --top-k 20 ^
      --min-p 0.00 ^
      --presence_penalty 1.5 ^
      --repeat_penalty 1.1 ^
      --no-mmap ^
      --no-warmup
The repeat and/or presence penalties seem to be somewhat sensitive with this model, so that might have caused the looping you saw.
throwdbaaway 12 minutes ago | parent [-]

I don't quite get the low temperature coupled with the high penalty. We get thinking loop due to low temperature, and we then counter it with high penalty. That seems backward.

For Qwen3.5 27B, I got good result with --temp 1.0 --top-p 1.0 --top-k 40 --min-p 0.2, without penalty. It allows the model to explore (temp, top-p, top-k) without going off the rail (min-p) during reasoning. No loop so far.