More tokens in the context means disproportionately more VRAM, to the extent that you really do need multiple GPUs if you're running an interestingly-sized model.