From my experience context window by itself tells half the story. You load a big document that’s 200k tokens and ask it a question, it will answer just fine. You start a conversation that soon enough balloons past 100k then it starts losing coherence pretty quickly. So I guess batch size plays a more significant role.

▲

IceHegel 4 days ago | parent [-]

By batch size, do you mean the number of tokens in the context window that were generated by the model vs. external tokens?

Because my understandings is that, however you get to 100K, the 100,001st token is generated the same way as far as the model is concerned.

	▲	4 days ago \| parent [-]
		[deleted]