▲ | mamp 3 days ago | |
Unfortunately, I think the context rot paper [1] found that the performance degradation when context increased still occurred in models using attention sinks. | ||
▲ | giancarlostoro 3 days ago | parent [-] | |
Saw that paper have not had a chance to read it yet, are there other techniques that help then? I assume theres a few different ones used. |