Remix.run Logo
devin 4 hours ago

The flip side of this problem is that it is also easy to phrase prompt in a way that invites _too much_ criticism, so you wind up sycophantic in the other direction where the completion rejects a perfectly good idea because the prompt leads a little bit in that direction.

One reaction to this might be "well that's not what I mean, that suggests you're prompting with too much directionality" which could further be condensed to "you're prompting wrong". The trouble with this is that _even when I am trying to be extremely precise and avoid biasing the result_, I still will see the output and go "ah shit, I can see it 'aligning' with whatever dumb thing I've just said as if it is a good/plausible direction".

At that point it starts to feel like the prompt is more dice roll than skill at times, which makes me feel like I'm operating a fancy knowledge slot machine.

Paracompact 4 hours ago | parent | next [-]

What it actually suggests is that the AI's response to these questions of judgment have little correlation with the thing it's judging. Sure, you can get it to be complimentary, if you want it to be. Sure, you can get it be critical, if you want it to be. But what if I don't know if my design needs to be complimented or critiqued in this instance? This is the default position when seeking input, and so "prompt with more/less humility" is like telling you to solve your own problems and then just use AI to confirm your bias---because it will rarely contradict your bias.

amarant 3 hours ago | parent [-]

So what I do when I'm not sure about something, is I say "I want to achieve X, I was thinking I could solve it by doing Y, what are the pros and cons of this approach, and what is a alternative solution you would suggest?"

And from there it's a interactive discussion drilling down on details until I understand the problem and the solutions better.

It definitely challenges my bias when I do this. The one thing it doesn't challenge is the X. Formulate the problem poorly, and you'll get a bad solution. Or rather, you'll end up with a good solution to the wrong problem. Which is even worse than a bad solution to the right problem.

Which is largely why I'm not at all worried about losing my job to AI. It takes some experience to formulate the problem correctly. I don't feel like I'm made redundant by AI, I'm just way faster than I used to be, my thinking is more abstract.

A good prompt I'll often use is "is there a industry standard solution that is applicable to this problem?" You very rarely want novel solutions. Don't reinvent the wheel just because AI lets you do it 10x as fast. Use a wheel. They're round for a reason.

Sometimes I find it useful to discuss things with a different model. I like Gemini for discussion and Claude for implementation. With Gemini I go about it as a learning session, discussing options and details. I honestly think this is mostly because it compartmentalizes the phases in a natural way for me. One interface for brainstorming and learning, and another for planning and implementing.

Sorry this comment turned into a rather disorganised collection of ramblings, I hope you can extract some kernel of usefulness from it all.

Paracompact an hour ago | parent [-]

Indeed I don't mean to downplay the usefulness that AI can have in the self-evaluation process. It's a wonderful engine for discovering information either general or specific to one's project.

> interactive discussion drilling down on details until I understand the problem and the solutions better.

I think it is fair to call this use of AI something akin to a fusion of a super-competent search engine and a leveled-up rubber duck (https://en.wikipedia.org/wiki/Rubber_duck_debugging). And this is not to downplay the utility of either of those things.

However, one cannot rely on an AI to decide when the details are sufficiently expounded, or when one understands them clearly enough. If one starts hinting that one gets it when one really doesn't, or that one is getting close to having all the pieces together, the AI will not be opinionated enough to contradict that sentiment.

> It definitely challenges my bias when I do this. The one thing it doesn't challenge is the X. Formulate the problem poorly, and you'll get a bad solution.

The best advice an expert can give a beginner is generally in the form of solutions to XY problems (https://en.wikipedia.org/wiki/XY_problem). It is a shame that AI are rarely opinionated enough to suggest you're not hunting the right thing. And if you do explicitly prompt it to consider if you're an XY problem, usually it takes that as a cue to indulge that suspicion regardless of its merit.

I don't think this is an inherent issue to LLMs and I see signs of it improving bit-by-bit. I can recall the shit-on-a-stick test about a year ago (https://www.reddit.com/r/ChatGPT/comments/1k920cg/new_chatgp...), and when I most recently asked Claude "Are oyster mushrooms or wine cap mushrooms more capable of high levels of sunlight?" it answered my question while also adding, "Caveat on the comparison: the relevant variable isn't sun per se but moisture retention. A wine cap bed that's kept moist will take far more sun than an exposed oyster log, but a sun-baked, drying bed will fail for either" which I think is a mature amount of pushback to include.

In the end I still disagree with the notion that subservience is, by default, the right attitude for an LLM to have. An agent spawned specifically for code generation according to a spec? Sure. But in any cases where you're trying to refine rather than execute your ideas, you want something to call you out on your bad ideas.

devin 17 minutes ago | parent [-]

Thanks for writing what I was thinking in response to the above. Namely that the mere suggestion to the LLM that you need a “pro/con list” kicks the bias off, and that’s the problem.

Edit: Well, not the whole problem, but rather insufficient to overcome the root of the problem.

jstummbillig 3 hours ago | parent | prev | next [-]

> The flip side of this problem is that it is also easy to phrase prompt in a way that invites _too much_ criticism, so you wind up sycophantic in the other direction where the completion rejects a perfectly good idea because the prompt leads a little bit in that direction.

I don't think that is the flip side. That's just obviously bad. Everything that is obviously bad, the model makers will also ~notice and work to make better. They seem to be a competent and attentive bunch, on the whole.

aksss 3 hours ago | parent | prev [-]

A good habit to build is knowing when to abandon a session and start over rather than trying to correct. There’s room for correction but you can kind of smell when the whole discussion has become rotten and inefficient. Sometimes it’s just better to use the session as rubber ducking to learn how to correctly articulate what you’re after and start a new session with that clean and correctly articulated foundation.