▲ | giancarlostoro 3 days ago | |||||||||||||
Here's a paper from MIT that covers how this could be resolved in an interesting fashion: https://hanlab.mit.edu/blog/streamingllm The AI field is reusing existing CS concepts for AI that we never had hardware for, and now these people are learning how applied Software Engineering can make their theoretical models more efficient. It's kind of funny, I've seen this in tech over and over. People discover new thing, then optimize using known thing. | ||||||||||||||
▲ | kridsdale3 3 days ago | parent | next [-] | |||||||||||||
The fact that this is happening is where the tremendous opportunity to make money as an experienced Software Engineer currently lies. For instance, a year or two ago, the AI people discovered "cache". Imagine how many millions the people who implemented it earned for that one. | ||||||||||||||
| ||||||||||||||
▲ | mamp 3 days ago | parent | prev [-] | |||||||||||||
Unfortunately, I think the context rot paper [1] found that the performance degradation when context increased still occurred in models using attention sinks. | ||||||||||||||
|