Why is this the case?
Are there any architectures that don't rely on feeding the entire history back into the chat?
Recurrent LLMs?