▲ | grej 4 days ago | |
Related to this, is anyone aware whether there is a benchmark on this kind of thing - maybe broadly the category of “context rot”? To track things that are not germane to the current question adversely affecting the responses, as well as the volume of germane but deep context creating the inability of models to follow the conversation? I’ve definitely experienced the latter with coding models. | ||
▲ | energy123 4 days ago | parent | next [-] | |
In computer vision they add noise to the picture when training. Maybe LLM providers should do the same during RL. | ||
▲ | nijave 4 days ago | parent | prev [-] | |
Not sure but sounds like a very similar problem to prompt injection |