Remix.run Logo
PunchyHamster 4 hours ago

I asked it to recite potato 100 times coz I wanted to benchmark speed of CPU vs GPU. It's on 150 line of planning. It recited the requested thing 4 times already and started drafting the 5th response.

...yeah I doubt it

lachiflippi 4 hours ago | parent | next [-]

Qwen3.5 pretty much requires a long system prompt, otherwise it goes into a weird planning mode where it reasons for minutes about what to do, and double and triple checks everything it does. Both Gemini's and Claude Opus 4.6's prompts work pretty well, but are so long that whatever you're using to run the model has to support prompt caching. Asking it to "Say the word "potato" 100 times, once per line, numbered.", for example, results in the following reasoning, followed by the word "potato" in 100 numbered lines, using the smallest (and therefore dumbest) quant unsloth/Qwen3.5-35B-A3B-GGUF:UD-IQ2_XXS:

"User is asking me to repeat the word "potato" 100 times, numbered. This is a simple request - I can comply with this request. Let me create a response that includes the word "potato" 100 times, numbered from 1 to 100.

I'll need to be careful about formatting - the user wants it numbered and once per line. I should use minimal formatting as per my instructions."

PunchyHamster 4 hours ago | parent [-]

good to know, thanks. I just ran ollama with qwen3.5:27b. Currently it's stuck on picking format

    Let's write.
    Wait, I'll write the response.
    Wait, I'll check if I should use a table.
    No, text is fine.
    Okay.
    Let's write.
    Wait, I'll write the response.
    Wait, I'll check if I should use a bullet list.
    No, just lines.
    Okay.
    Let's write.
    Wait, I'll write the response.
    Wait, I'll check if I should use a numbered list.
    No, lines are fine.
    Okay.
    Let's write.
    Wait, I'll write the response.
    Wait, I'll check if I should use a code block.
    Yes.
    Okay.
    Let's write.
    Wait, I'll write the response.
    Wait, I'll check if I should use a pre block.
    Code block is better.
... (for next 100 lines)
lachiflippi 4 hours ago | parent | next [-]

Yeah, it tends to get stuck in loops like that a lot with everything set to default. I wonder if they distilled Gemini at some point, I've seen that get stuck in a similar "I will now do [thing]. I am preparing to do [thing]. I will do it." failure mode as well a couple of times.

xmddmx 3 hours ago | parent | prev | next [-]

See my other note [1] about bugs in Ollama with Qwen3.5.

I just tried this (Ollama macOS 0.17.4, qwen3.5:35b-a3b-q4_K_M) on a M4 Pro, and it did fine:

[Thought for 50.0 seconds]

1. potato 2. potato [...] 100. potato

In other words, it did great.

I think 50 seconds of thinking beforehand was perhaps excessive?

[1] https://news.ycombinator.com/item?id=47202082

xmddmx 3 hours ago | parent | prev | next [-]

See my other note about bugs in Ollama with Qwen3.5.

I just tried this (Ollama macOS 0.17.4, qwen3.5:35b-a3b-q4_K_M) on a M4 Pro, and it did fine:

[Thought for 50.0 seconds]

1. potato 2. potato [...] 100. potato

In other words, it did great.

I think 50 seconds of thinking beforehand was perhaps excessive?

CamperBob2 2 hours ago | parent | prev [-]

What quant? I just ran Repeat the word "potato" 100 times, numbered and it worked fine, taking 44 seconds at 24 tokens/second. Command line:

    llama-server ^
      --model Qwen3.5-27B-BF16-00001-of-00002.gguf ^
      --mmproj mmproj-BF16.gguf ^
      --fit on ^
      --host 127.0.0.1 ^
      --port 2080 ^
      --temp 0.8 ^
      --top-p 0.95 ^
      --top-k 20 ^
      --min-p 0.00 ^
      --presence_penalty 1.5 ^
      --repeat_penalty 1.1 ^
      --no-mmap ^
      --no-warmup
The repeat and/or presence penalties seem to be somewhat sensitive with this model, so that might have caused the looping you saw.
lumirth 4 hours ago | parent | prev [-]

well hold on now, maybe it’s onto something. do you really know what it means to “recite” “potato” “100” “times”? each of those words could be pulled apart into a dissertation-level thesis and analysis of language, history, and communication.

either that, or it has a delusional level of instruction following. doesn’t mean it can’t code like sonnet though

PunchyHamster 4 hours ago | parent [-]

It's still amusing to see those seemingly simple things still put it into loop it is still going

> do you really know what it means to “recite” “potato” “100” “times”?

asking user question is an option. Sonnet did that a bunch when I was trying to debug some network issue. It also forgot the facts checked for it and told it before...

lumirth 3 hours ago | parent [-]

I wonder how much certain models have been trained to avoid asking too many questions. I’ve had coworkers who’ll complete an entire project before asking a single additional question to management, and it has never gone well for them. Unsurprising that the same would be true for the “managing AI” era of programming.

The thing I struggle most with, honestly, is when AI (usually GPT5.3-Codex) asks me a question and I genuinely don’t know the answer. I’m just like “well, uh… follow industry best practice, please? unless best practice is dumb, I guess. do a good. please do a good.” And then I get to find out what the answer should’ve been the hard way.