It's an excellent demonstration of the main issue I have with the Gemini family of models, they always go "above and beyond" to do a lot of stuff, even if I explicitly prompt against it. In this case, most of the SVG ends up consisting not just of a bike and a pelican, but clouds, a sun, a hat on the pelican and so much more.

Exactly the same thing happens when you code, it's almost impossible to get Gemini to not do "helpful" drive-by-refactors, and it keeps adding code comments no matter what I say. Very frustrating experience overall.

▲ mullingitover 4 hours ago | parent | next [-]

> it's almost impossible to get Gemini to not do "helpful" drive-by-refactors

Just asking "Explain what this service does?" turns into

[No response for three minutes...]

+729 -522

▲ cowmoo728 3 hours ago | parent | next [-]

it's also so aggressive about taking out debug log statements and in-progress code. I'll ask it to fill in a new function somewhere else and it will remove all of the half written code from the piece I'm currently working on.

▲

chankstein38 3 hours ago | parent [-]

I ended up adding a "NEVER REMOVE LOGGING OR DEBUGGING INFO, OPT TO ADD MORE OF IT" to my user instructions and that has _somewhat_ fixed the problem but introduced a new problem where, no matter what I'm talking to it about, it tries to add logging. Even if it's not a code problem. I've had it explain that I could setup an ESP32 with a sensor so that I could get logging from it then write me firmware for it.

▲

sd9 3 hours ago | parent | next [-]

If it's adding too much logging now, have you tried softening the instruction about adding more?

"NEVER REMOVE LOGGING OR DEBUGGING INFO. If unsure, bias towards introducing sensible logging."

Or just

"NEVER REMOVE LOGGING OR DEBUGGING INFO."

▲

bratwurst3000 3 hours ago | parent | prev [-]

"I've had it explain that I could setup an ESP32 with a sensor so that I could get logging from it then write me firmware for it." lol did you try it? This so far from everything ratinonal

	▲	2 hours ago \| parent [-]
		[deleted]

▲ BartShoot 3 hours ago | parent | prev | next [-]

if you had to ask it obviously needs to refactor code for clarity so next person does not need to ask

▲ quotemstr 3 hours ago | parent | prev | next [-]

What. You don't have yours ask for edit approval?

▲ embedding-shape 3 hours ago | parent | next [-]

Who has time for that? This is how I run codex: `codex --sandbox danger-full-access --dangerously-bypass-approvals-and-sandbox --search exec "$PROMPT"`, having to approve each change would effectively destroy the entire point of using an agent, at least for me.

Edit: obviously inside something so it doesn't have access to the rest of my system, but enough access to be useful.

▲ well_ackshually 36 minutes ago | parent | next [-]

>Who has time for that?

People that don't put out slop, mostly.

▲ quotemstr 2 hours ago | parent | prev [-]

I wouldn't even think of letting an agent work in that made. Even the best of them produce garbage code unless I keep them on a tight leash. And no, not a skill issue.

What I don't have time to do is debug obvious slop.

	▲	kees99 2 hours ago \| parent [-]
		I ended up running codex with all the "danger" flags, but in a throw-away VM with copy-on-write access to code folders. Built-in approval thing sounds like a good idea, but in practice it's unusable. Typical session for me was like: `About to run "sed -n '1,100p' example.cpp", approve? About to run "sed -n '100,200p' example.cpp", approve? About to run "sed -n '200,300p' example.cpp", approve?` Could very well be a skill issue, but that was mighty annoying, and with no obvious fix (options "don't ask again for ...." were not helping).

▲ mullingitover an hour ago | parent | prev [-]

Ask mode exists, I think the models work on the assumption that if you're allowing edits then of course you must want edits.

▲ kylec 4 hours ago | parent | prev | next [-]

"I don't know what did it, but here's what it does now"

▲ SignalStackDev 3 hours ago | parent | prev [-]

[dead]

▲ h14h 2 hours ago | parent | prev | next [-]

Would be really interesting to see an "Eager McBeaver" bench around this concept. When doing real work, a model's ability to stay within the bounds of a given task has almost become more important than its raw capabilities now that every frontier model is so dang good.

Every one of these models is so great at propelling the ship forward, that I increasingly care more and more about which models are the easiest to steer in the direction I actually want to go.

▲

cglan 2 hours ago | parent [-]

being TOO steerable is another issue though.

Codex is very steerable to a fault, and will gladly "monkey paw" your requests to a fault.

Claude Opus will ignore your instructions and do what it thinks is "right" and just barrel forward.

Both are bad and papering over the actual issue which is these models don't really have the ability to actually selectively choose their behavior per issue (ie ask for followup where needed, ignore users where needed, follow instructions where needed). Behavior is largely global

	▲	kees99 2 hours ago \| parent [-]
		I my experience Claude gradually stops being opinionated as task at hand becomes more arcane. I frequently add "treat the above as a suggestion, and don't hesitate to push back" to change requests, and it seems to help quite a bit.

▲ enobrev 4 hours ago | parent | prev | next [-]

I have the same issue. Even when I ask it to do code-reviews and very explicitly tell it not to change files, it will occasionally just start "fixing" things.

	▲	mikepurvis 3 hours ago \| parent [-]
		I find Copilot leans the other way. It'll myopically focus its work in the exact function I point it at, even when it's clear that adding a new helper would be a logical abstraction to share behaviour with the function right beside it. Overall, I think it's probably better that it stay focused, and allow me to prompt it with "hey, go ahead and refactor these two functions" rather than the other way around. At the same time, really the ideal would be to have it proactively ask, or even pitch the refactor as a colleague would, like "based on what I see of this function, it would make most sense to XYZ, do you think that makes sense? <sure go ahead> <no just keep it a minimal change>" Or perhaps even better, simply pursue both changes in parallel and present them as A/B options for the human reviewer to select between.

▲ neya 3 hours ago | parent | prev | next [-]

> it's almost impossible to get Gemini to not do "helpful" drive-by-refactors

This has not been my experience. I do Elixir primarily and Gemini has helped build some really cool products and massive refactors along the way. And it would even pick up security issues and potential optimizations along the way

What HAS been an issue constantly though was randomly the model will absolutely not respond at all and some random error would occur which is embarrassing for a company like Google with the infrastructure they own.

	▲	embedding-shape 3 hours ago \| parent [-]
		Out of curiosity, do you have any public projects (with public source code) you've made exclusively with Gemini, so one could take a look? I've tried a bunch of times to use Gemini to at least finish something small but I always end up sufficiently frustrated to abort it as the instruction-following seems so bad.

▲ msteffen 2 hours ago | parent | prev | next [-]

> it's almost impossible to get Gemini to not do "helpful" drive-by-refactors

Not like human programmers. I would never do this and have never struggled with it in the past, no...

	▲	embedding-shape an hour ago \| parent [-]
		Fairer comparison would be against other models, which are typically better at instruction following. You say "don't change anything not explicitly mentioned" or "Don't add any new code comments" and they tend to follow that.

▲ apitman 2 hours ago | parent | prev | next [-]

This matches my experience using Gemini CLI to code. It would also frequently get stuck in loops. It was so bad compared to Codex that I feel like I must have been doing something fundamentally wrong.

▲ tyfon 3 hours ago | parent | prev | next [-]

I was using gemini antigravity in opencode a few weeks ago before they started banning everyone for that and I got into the habit of writing "do x, then wait for instructions".

That helped quite a bit but it would still go off on it's own from time to time.

▲ JLCarveth 3 hours ago | parent | prev | next [-]

Every time I have tried using `gemini-cli` it just thinks endlessly and never actually gives a response.

▲ gavinray 4 hours ago | parent | prev | next [-]

Do you have Personalization Instructions set up for your LLM models?

You can make their responses fairly dry/brief.

▲ embedding-shape 4 hours ago | parent | next [-]

I'm mostly using them via my own harnesses, so I have full control of the system prompts and so on. And no matter what I try, Gemini keeps "helpfully" adding code comments every now and then. With every other model, "- Don't add code comments" tends to be enough, but with Gemini I'm not sure how I could stop the comments from eventually appearing.

▲

WarmWash 4 hours ago | parent | next [-]

I'm pretty sure it writes comments for itself, not for the user. I always let the models comment as much as they want, because I feel it makes the context more robust, especially when cycling contexts often to keep them fresh.

There is a tradeoff though, as comments do consumer context. But I tend to pretty liberally dispense of instances and start with a fresh window.

	▲	embedding-shape 4 hours ago \| parent [-]
		> I'm pretty sure it writes comments for itself, not for the user Yeah, that sounds worse than "trying to helpful". Read the code instead, why add indirection in that way, just to be able to understand what other models understand without comments?

▲

4 hours ago | parent | prev [-]

[deleted]

▲ metal_am 4 hours ago | parent | prev [-]

I'd love to hear some examples!

	▲	gavinray 4 hours ago \| parent \| next [-]
		I use LLM's outside of work primarily for research on academic topics, so mine is: `Be a proactive research partner: challenge flawed or unproven ideas with evidence; identify inefficiencies and suggest better alternatives with reasoning; question assumptions to deepen inquiry.`
	▲	3 hours ago \| parent \| prev \| next [-]
		[deleted]
	▲	ai4prezident 3 hours ago \| parent \| prev [-]
		[dead]

▲ zengineer 4 hours ago | parent | prev [-]

true, whenever I ask Gemini to help me with a prompt for generating an image of XYZ, it generates the image.