Remix clone Hacker News

new | show | ask | jobs Github

	▲	paradite 4 hours ago
		In theory, auto-regressive models should not have limit on context. It should generate the next token with all previous tokens. In practice, when training a model, people select a context window so that during inference, you know how much GPU memory to allocate for a prompt and reject the prompt if it exceeds the memory limit. Of course there's also degrading performance as context gets longer, but I suspect memory limit is the primary factor of why we have context window limits.