| ▲ | paradite 4 hours ago | |
In theory, auto-regressive models should not have limit on context. It should generate the next token with all previous tokens. In practice, when training a model, people select a context window so that during inference, you know how much GPU memory to allocate for a prompt and reject the prompt if it exceeds the memory limit. Of course there's also degrading performance as context gets longer, but I suspect memory limit is the primary factor of why we have context window limits. | ||