Inference is getting cheaper by the minute, because hardware is getting cheaper and also because smarter ideas like latent attention are spreading.