Agree! I’m not an AI engineer or researcher, but it always struck me as odd that we would serialise the 100B or whatever parameters of latent space down to maximum 1M tokens and back for every step.