▲ | donatj a day ago | ||||||||||||||||||||||
I know nothing about Whisper, is this usable for automated translation? I own a couple very old and as far as I'm aware never translated Japanese movies. I don't speak Japanese but I'd love to watch them. A couple years ago I had been negotiating with a guy on Fiver to translate them. At his usual rate-per-minute of footage it would have cost thousands of dollars but I'd negotiated him down to a couple hundred before he presumably got sick of me and ghosted me. | |||||||||||||||||||||||
▲ | ethan_smith 21 hours ago | parent | next [-] | ||||||||||||||||||||||
Whisper can indeed transcribe Japanese and translate it to English, though quality varies by dialect and audio clarity. You'll need the "large-v3" model for best results, and you can use ffmpeg's new integration with a command like `ffmpeg -i movie.mp4 -af whisper=model=large-v3:task=translate output.srt`. | |||||||||||||||||||||||
| |||||||||||||||||||||||
▲ | neckro23 18 hours ago | parent | prev | next [-] | ||||||||||||||||||||||
In my experience it works ok. The "English" model actually knows a lot of languages and will translate directly to English. You can also transcribe it to Japanese and use a translator to convert to English. This can sometimes help for more semantically complex dialogue. For example, using faster-whisper-xxl [1]: Direct translation:
Use Japanese, then translate:
1. https://github.com/Purfview/whisper-standalone-win | |||||||||||||||||||||||
▲ | prmoustache a day ago | parent | prev | next [-] | ||||||||||||||||||||||
My personnal experience trying to transcribe (not translate) was a complete failure. The thing would invent stuff. It would also be completely lost when more than one language is used. It also doesn't understand contexts so does a lot of errors you see in automatic translations from videos in youtube for example. | |||||||||||||||||||||||
| |||||||||||||||||||||||
▲ | BetterWhisper 18 hours ago | parent | prev | next [-] | ||||||||||||||||||||||
Hey, indeed Whisper can do the transcription of Japanese and even the translation (but only to English). For the best results you need to use the largest model which depending on your hardware might be slow or fast. Another option is to use something like VideoToTextAI which allows you to transcribe it fast and then translate it into 100+ languages which you can then export the subtitle (SRT) file for | |||||||||||||||||||||||
▲ | trenchpilgrim a day ago | parent | prev | next [-] | ||||||||||||||||||||||
Whisper has quite bad issues with hallucination. It will inject sentences that were never said in the audio. It's decent for classification but poor at transcription. | |||||||||||||||||||||||
| |||||||||||||||||||||||
▲ | _def a day ago | parent | prev | next [-] | ||||||||||||||||||||||
May I ask which movies? I'm just curious | |||||||||||||||||||||||
▲ | poglet a day ago | parent | prev [-] | ||||||||||||||||||||||
Yep, whisper can do that. You can also try whisperx (https://github.com/m-bain/whisperX) for a possibly better experience with aligning of subtitles to spoken words. |