| ▲ | SpaceManNabs 4 hours ago | |
> No transcription, no frame captioning, no intermediate text. If there is text on the video (like a caption or wtv), will the embedding capture that? Never thought about this before. If the video has audio, does the embedding capture that too? | ||
| ▲ | sohamrj 4 hours ago | parent [-] | |
Yes to both. The embedding is over raw video frames, so anything visible (text, signs, captions) gets captured in the vector. And Gemini Embedding 2 extracts the audio track and embeds it alongside the visual frames. So a query like 'someone yelling' would theoretically match on audio. My dashcam footage doesn't have audio though, so I haven't tested that side yet. | ||