| ▲ | qsort 5 hours ago | |||||||
> what did i get wrong here? You don't know how an LLM works and you are operating on flawed anthropomorphic metaphors. Ask a frontier LLM what a context window is, it will tell you. | ||||||||
| ▲ | Palmik 4 hours ago | parent | next [-] | |||||||
It's a fair question, even if it might be coming from a place of misunderstanding. For example, DeepSeek 3.2, which employs sparse attention [1], is not only faster with long context than normal 3.1, but also seems to be better (perhaps thanks to reducing the noise?). [1] It uses still quadratic router, but it's small, so it scales well in practice. https://api-docs.deepseek.com/news/news250929 | ||||||||
| ▲ | 4 hours ago | parent | prev | next [-] | |||||||
| [deleted] | ||||||||
| ▲ | ed 4 hours ago | parent | prev [-] | |||||||
Parent is likely thinking of sparse attention which allows a significantly longer context to fit in memory | ||||||||
| ||||||||