Remix.run Logo
preetsojitra 9 hours ago

Meta's Perception Encoder Audio-Visual, its CLIP like but has three modality: Audio, Video and Text