| ▲ | simonw 7 hours ago | |
Yeah I agree. I ran Whisper (via MacWhisper) on the same video and got back accurate timestamps. The big benefit of Gemini for this is that it appears to do a great job of speaker recognition, plus it can identify when people interrupt each other or raise their voices. The best solution would likely include a mixture of both - Gemini for the speaker identification and tone-of-voice stuff, Whisper or NVIDIA Parakeet or similar for the transcription with timestamps. | ||