Do we have any performance benchmark with token length? Now that the context size is 1 M. I would want to know if I can exhaust all of that or should I clear earlier?