▲ | bondarchuk a day ago | |
I don't understand how this would happen, though. It's not like it will mishear a dutch sentence as if it's english; it will correctly pick up the dutch sentence, but (since the language is auto-detected as english at the start of the segment), seemingly auto-translate that (correct and correctly heard) dutch text to english. All we need is a way to get the dutch text that's surely somewhere in there, before the translation happens. Unless it was trained end-to-end on dutch-subtitled english text?? Which might make the translation a somewhat inextricable part of the model..? Does anyone know? | ||
▲ | busup 9 hours ago | parent [-] | |
Maybe try the turbo model which is transcription only. The other models were trained on x to en translations and they seem to emphasise the output language over the task token. You can get them to translate to any language even though it was never trained for that, comparatively nl-en translation is in the dataset so I'm not surprised it's doing that. |