Remix.run Logo
johnb231 a day ago

The latest models are natively multimodal. Audio, video, images, text, are all tokenised and interpreted in the same model.