| ▲ | noosphr 3 hours ago | ||||||||||||||||
| ▲ | simonw 2 hours ago | parent [-] | ||||||||||||||||
To save anyone else the click, this is the paper "On The Computational Complexity of Self-Attention" from September 2022, with authors from NYU and Microsoft. It argues that the self-attention mechanism in transformers works by having every token "attend to" every other token in a sequence, which is quadratic - n^2 against input - which should limit the total context length available to models. This would explain why the top models have been stuck at 1 million tokens since Gemini 1.5 in February 2024 (there has been a 2 million token Gemini but it's not in wide use, and Meta claimed their Llama 4 Scout could do 10 million but I don't know of anyone who's seen that actually work.) My counter-argument here is that Claude Opus 4.5 has a comparatively tiny 200,000 token window which turns out to work incredibly well for the kinds of coding problems we're throwing at it, when accompanied by a cleverly designed harness such as Claude Code. So this limit from 2022 has been less "disappointing" than people may have expected. | |||||||||||||||||
| |||||||||||||||||