| ▲ | Paracompact 2 hours ago | |
Indeed I don't mean to downplay the usefulness that AI can have in the self-evaluation process. It's a wonderful engine for discovering information either general or specific to one's project. > interactive discussion drilling down on details until I understand the problem and the solutions better. I think it is fair to call this use of AI something akin to a fusion of a super-competent search engine and a leveled-up rubber duck (https://en.wikipedia.org/wiki/Rubber_duck_debugging). And this is not to downplay the utility of either of those things. However, one cannot rely on an AI to decide when the details are sufficiently expounded, or when one understands them clearly enough. If one starts hinting that one gets it when one really doesn't, or that one is getting close to having all the pieces together, the AI will not be opinionated enough to contradict that sentiment. > It definitely challenges my bias when I do this. The one thing it doesn't challenge is the X. Formulate the problem poorly, and you'll get a bad solution. The best advice an expert can give a beginner is generally in the form of solutions to XY problems (https://en.wikipedia.org/wiki/XY_problem). It is a shame that AI are rarely opinionated enough to suggest you're not hunting the right thing. And if you do explicitly prompt it to consider if you're an XY problem, usually it takes that as a cue to indulge that suspicion regardless of its merit. I don't think this is an inherent issue to LLMs and I see signs of it improving bit-by-bit. I can recall the shit-on-a-stick test about a year ago (https://www.reddit.com/r/ChatGPT/comments/1k920cg/new_chatgp...), and when I most recently asked Claude "Are oyster mushrooms or wine cap mushrooms more capable of high levels of sunlight?" it answered my question while also adding, "Caveat on the comparison: the relevant variable isn't sun per se but moisture retention. A wine cap bed that's kept moist will take far more sun than an exposed oyster log, but a sun-baked, drying bed will fail for either" which I think is a mature amount of pushback to include. In the end I still disagree with the notion that subservience is, by default, the right attitude for an LLM to have. An agent spawned specifically for code generation according to a spec? Sure. But in any cases where you're trying to refine rather than execute your ideas, you want something to call you out on your bad ideas. | ||
| ▲ | devin an hour ago | parent [-] | |
Thanks for writing what I was thinking in response to the above. Namely that the mere suggestion to the LLM that you need a “pro/con list” kicks the bias off, and that’s the problem. Edit: Well, not the whole problem, but rather insufficient to overcome the root of the problem. | ||