It does/says something wrong. You give it feedback and then it's a loop! Often it just doesn't get it. You supply it webpages (text only webpages - which it can easily read, or I hope so). It says it got it and next line the output is the old wrong answer again.

There are worse examples, here is one (I am "making this up" :D to give you an idea):

> To list hidden files you have to use "ls -h", you can alternatively use "ls --list".

Of course you correct it, try to reason and then supply a good old man page url and after few times it concedes and then it gives you the answer again:

> You were correct in pointing the error out. to list the hidden files you indeed have to type "ls -h" or "ls --list"

Also - this is just really a mild example.

▲

weitendorf 5 days ago | parent | next [-]

I suspect you are interacting with LLMs in a single, long conversation corresponding to your "session" and prompting fixes/new info/changes in direction between tasks.

This is a very natural and common way to interact with LLMs but also IMO one of the biggest avoidable causes of poor performance.

Every time you send a message to an LLM you actually send the entire conversation history. Most of the time a large portion of that information will no longer be relevant, and sometimes it will be wrong-but-corrected later, both of which are more confusing to LLMs than to us because of the way attention works. The same applies to changes in the current task/objective or instructions: the more outdated, irrelevant, or inconsistent they are, the more confused the LLM becomes.

Also, LLMs are prone to the Purple Elephant problem (just like humans): the best way to get them to not think about purple elephants is to not mention them at all, as opposed to explicitly instructing them not to reference purple elephants. When they encounter errors, they are biased to previous assumptions/approaches they tend to have laid out previously in the conversation.

I generally recommend using many short per-task conversations to interact with LLMs, with each having as little irrelevant/conflicting context as possible. This is especially helpful for fixing non-trivial LLM-introduced errors because it reframes the task and eliminates the LLM's bias towards the "thinking" that caused it to introduce the bug to begin with

▲

logicprog 5 days ago | parent | prev [-]

Hi from the other thread :P

If you'll forgive me putting my debugging hat on for a bit, because solving problems is what most if us do here, I wonder if it's not actually reading the URL, and maybe that's the source of the problem, bc I've had a lot of success feeding manuals and such to AIs and then asking it to synthesize commands or asking it questions about them. Also, I just tried asking Gemini 2.5 Flash this and it did a web search, found a source, answered my question correctly (ls -a, or -la for more detail), and linked me to the precise part of its source it referenced: https://kinsta.com/blog/show-hidden-files/#:~:text=If%20you'... (this is the precise link it gave me).

▲

crossroadsguy 5 days ago | parent [-]

Well, in one case (it was borg or restic doc) I noticed it actually picked something correctly from the URL/page and then still messed up in the answer.

What my guess is - maybe it read the URL and mentioned a few things as one part of its "that" answer/output but for the other part it relied it on the learning it already had. Maybe it doesn't learn "on the go". I don't know, could be a safeguard against misinformation or spamming the model or so.

As I said in my comment, I hadn't asked it "ls -a" question but rather something else - different commands on different times which I don't recall now except borg and restic ones which I did recently. "ls -a" is the example I picked to show one of the things I was"cribbing" about.

	▲	logicprog 5 days ago \| parent [-]
		Yeah my bad, I was responding late at night and had a reading comprehension failure