I don't think we can say that until we hear how Genie3 and Veo3 were trained. My hunch is that the next-gen multi-modal models that combine world, video, text, and image models can only be trained on the best chips.