How would information leak, though? There’s no difference in the probability distribution the model outputs when caching vs not caching.