Remix.run Logo
devinprater 5 hours ago

Audio described Youtube please? That'd be so amazing! Even if I couldn't play Zelda yet, I could listen to a playthrough with Gemini describing it.

SXX 2 hours ago | parent | next [-]

BTW I asked detailed narrative descriprion of other purely benchmarking Zelda video with 5 second snapshots:

Video: Zelda TOTK, R5 5600X, GTX 1650, 1080p 10 Minute Gameplay, No Commentary

https://www.youtube.com/watch?v=wZGmgV-8Rbo

Here can be found narrative descriprion source and command:

https://gist.github.com/ArseniyShestakov/47123ce2b6b19a8e6b3...

Then I converted it into narrative voice over with Gemini 2.5 Pro TTS:

https://drive.google.com/file/d/1Js2nDtM7sx14I43UY2PEoV5PuLM...

It's somewhat desynced from original video and voice over take 9 and half minutes instead of 10 in video, but description of what happening on screen is quite accurate.

PS: I used 144p video so details could be also messed up because of poor quality. And ofc I specifically asked for narrative-like descripription

SXX 5 hours ago | parent | prev [-]

Hey, I just made simple test on 5 minute downloaded YouTube video uploading it to Gemini app.

Source video title: Zelda: Breath of the Wild - Opening five minutes of gameplay

https://www.youtube.com/watch?v=xbt7ZYdUXn8

Prompt:

   Please describe what happening in each scene of this video.
   
   List scenes with timestamp, then describe separately:
   - Setup and background, colors
   - What is moving, what appear
   - What objects in this scene and what is happening,
   
   Basically make desceiption of 5 minutes video for a person who cant watch it.
Result on github gist since there too much text:

https://gist.github.com/ArseniyShestakov/43fe8b8c1dca45eadab...

I'd say thi is quite accurate.

SXX 4 hours ago | parent [-]

Another example with completely random 10 minute benchmark video from Tears of Kingdom:

https://gist.github.com/ArseniyShestakov/47123ce2b6b19a8e6b3...