You still need to do a forward pass per token. With massive batching and full pipelining you might be able to break the dependencies and output one token per cycle but clearly they aren't doing that.