That's a pretty crazy requirement for something to be "useful" especially something that runs so efficiently on cpu. Many content creators from non-english speaking countries can benefit from this type of release by translating transcripts of their content to english and then running it through a model like this to dub their videos in a language that can reach many more people.

▲

ethin 6 hours ago | parent | next [-]

Uh, no? This is not at all an absurd requirement? Screen readers literally do this all the time, with voices that are the classic way of making a speech synthesizer, no AI required. ESpeak is an example, or MS OneCore. The NVDA screen reader has an option for automatic language switching as does pretty much every other modern screen reader in existence. And absolutely none of these use AI models to do that switching, either.

▲

kube-system 3 hours ago | parent [-]

They didn’t say it was a crazy requirement. They said it was crazy to consider it useless without meeting that requirement.

	▲	ethin 2 hours ago \| parent [-]
		That doesn't really change what I said though. It isn't crazy to call it useless without some form of ALS either. Given that old school synthesis has been able to do it for like 20 years or so.

▲

phoronixrly 7 hours ago | parent | prev [-]

You mean youtubers? And have to (manually) synchronise the text to their video, and especially when youtube apparently offers voice-voice translation out of the box to my and many others' annoyance?

	▲	littlestymaar 11 minutes ago \| parent [-]
		YouTube's voice to voice is absolutely horrible though. Having the ability for the youtubers to clone their own voice would make it much, much more appealing.