▲ | namibj 4 days ago | |
The problem is that all current mainstream LLMs are autoregressive decoder-only, mostly but not exclusively transformers. Their math can't apply modifiers like "this example/attempt there is wrong due to X,Y,Z" to anything that came before the modifier clause in the prompt. Despite how enticing these models are to train, these limitations are inherent. (For this specific situation people recommend going back to just before the wrong output and editing the message to reflect this understanding, as the confidently wrong output with no advisory/correcting pre-clause will "pollute the context": the model will look at the context for some aspects coded into high(-er)-layer token embeddings, inherently can't include the correct/wrong aspect because we couldn't apply the "wrong"/correction to the confidently-wrong tokens, thus retrieves the confidently-wrong tokens, and subsequently spews even more BS. Similar to how telling a GPT2/GPT3 model it's an expert on $topic made it actually be better on said topic, this affirmation of that the model made an error will prime the model to behave in a way that it gets yelled at again... sadly.) |