Remix.run Logo
bjackman 4 hours ago

Lately with Gemini CLI / Jules it doesn't seem like time spent is a good proxy for difficulty. It has a big problem with getting into loops of "I am preparing the response for the user. I am done. I will output the answer. I am confident. Etc etc".

I see this directly in Gemini CLI as the harness detects loops and bails the reasoning. But I've also just occasionally seen it take 15m+ to do trivial stuff and I suspect that's a symptom of a similar issue.

mixel 44 minutes ago | parent | next [-]

I saw this too. Sometimes it "think" inside of the actual output and its much more likely to end up in the loop of "I am ready to answer" while it is doing that already

sva_ an hour ago | parent | prev [-]

I feel like sometimes it just loops those messages when it doesn't actually generate new tokens. But I might be wrong

bjackman 38 minutes ago | parent [-]

There are some other failure modes that all feel kinda vaguely related that probably help with building a hypothesis about what's going wrong:

Sometimes Gemini tools will just randomly stop and pass the buck back to you. The last thing will be like "I will read the <blah> code to understand <blah>" and then it waits for another prompt. So I just type "continue" and it starts work again.

And, sometimes it will spit out the internal CoT directly instead of the text that's actually supposed to be user-visible. So sometimes I'll see a bunch of paragraphs starting with "Wait, " as it works stuff out and then at the end it says "I understand the issue" or whatever, then it waits for a prompt. I type "summarise" and it gives me the bit I actually wanted.

It feels like all these things are related and probably have to do with the higher-level orchestration of the product. Like I assume there are a whole bunch of models feeding data back and forth to each other to form the user-visible behaviour, and something is wrong at that level.