I think Gemini is an excellent model, it's just not a particularly great agent. One of the reasons is that its code output is often structured in a way that looks like it's answering a question, rather than generating production code. It leaves comments everywhere, which are often numbered (which not only is annoying, but also only makes sense if the numbering starts within the frame of reference of the "question" it's "answering").

It's also just not as good at being self-directed and doing all of the rest of the agent-like behaviors we expect, i.e. breaking down into todolists, determining the appropriate scope of work to accomplish, proper tool calling, etc.

▲

freedomben 8 hours ago | parent | next [-]

Yeah, you may have nailed it. Gemini is a good model, but in the Gemini CLI with a prompt like, "I'd like to add <feature x> support. What are my options? Don't write any code yet" it will proceed to skip right past telling me my options and will go ahead an implement whatever it feels like. Afterward it will print out a list of possible approaches and then tell you why it did the one it did.

Codex is the best at following instructions IME. Claude is pretty good too but is a little more "creative" than codex at trying to re-interpret my prompt to get at what I "probably" meant rather than what I actually said.

▲

phainopepla2 4 hours ago | parent | next [-]

Try the conductor extension for gemini-cli: https://github.com/gemini-cli-extensions/conductor

It won't make any changes until a detailed plan is generated and approved.

▲

michaelcampbell 7 hours ago | parent | prev | next [-]

Can you (or anyone) explain how this might be? The "agent" is just a passthrough for the model, no? How is one CLI/TUI tool better than any other, given the same model that it's passing your user input to?

I am familiar with copilot cli (using models from different providers), OpenCode doing the same, and Claude with just the \A models, but if I ask all 3 the same thing using the same \A model, I SHOULD be getting roughly the same output, modulo LLM nondeterminism, right?

	▲	taylorius 3 hours ago \| parent [-]
		maybe different preparatory "system" prompts?

▲

PantaloonFlames 4 hours ago | parent | prev [-]

I've had the exact opposite experience. After including in my prompt "don't write any code yet" (or similar brief phrase), Gemini responds without writing code.

Using Gemini 2.5 or 3, flash.

▲

sutterd 7 hours ago | parent | prev [-]

My go-to models have been Claude and Gemini for a long time. I have been using Gemini for discussions and Claude for coding and now as an agent. Claude has been the best at doing what I want to do and not doing what I don’t want to do. And then my confidence in it took a quantum leap with Opus 4.5. Gemini seems like it has gotten even worse at doing what I want with new releases.