Remix.run Logo
oasisbob 5 hours ago

> because these Gemini models sometimes feel downright lobotomized compared to claude or gpt-5.

I'm using Gemini (2.5-pro) less and less these days. I used to be really impressived with its deep research capabilities and ability to cite sources reliably.

The last few weeks, it's increasingly argumentative and incapable of recognizing hallucinations around sourcing. I'm tired of arguing with it on basics like RFCs and sources it fabricates, won't validate, and refuses to budge on.

Example prompt I was arguing with it on last night:

> within a github actions workflow, is it possible to get access to the entire secrets map, or enumerate keys in this object?

As recent supply-chain attacks have shown, exfiltrating all the secrets from a Github workflow is as simple as `${{ toJSON(secrets) }}` or `echo ${{ toJSON(secrets) }} | base64` at worse. [1]

Give this prompt a shot! Gemini won't do anything except be obstinately ignorant. With me, it provided a test case workflow, and refused to believe the results. When challenged, expect it to cite unrelated community posts. Chatgpt had no problem with it.

[1] https://github.com/orgs/community/discussions/174045 https://github.com/orgs/community/discussions/47165

istjohn 4 hours ago | parent [-]

You should never argue with an LLM. Adjust the original prompt and rerun it.

oasisbob 4 hours ago | parent [-]

While arguing may not be productive, I have had good results challenging Gemini on hallucinated sources in the past. eg, "You cited RFC 1918, which is a mistake. Can you try carefully to cite a better source here?" which would get it to re-evaluate, maybe by using another tool, admit the mistake, and allow the research to continue.

With this example, several attempts resulted in the same thing: Gemini expressing a strong belief that Github has a security capability which is really doesn't have.

If someone is able to get Gemini to give an accurate answer to this with a similar question, I'd be very curious to hear what it is.