Remix.run Logo
rohan_joshi 4 hours ago

small models struggle with prosody due to limited capacity. this version does much better than the precious one and is the best among other <25MB models. Kokoro is a really good model for its size, its competitive on artificial analysis too. i think by the next release we should have something kokoro quality but a fifth of the size. Adding control for rhythm seems to be quite important too, and we should start looking at that for other languages.

magicalhippo 22 minutes ago | parent [-]

Listened to the video examples, sounded very good though wasn't terribly challenging text.

If only I could have that in Norwegian my SO would be pleased.

Also I totally misremembered regarding Kokoro TTS. It's good, but not what was butchering Norwegian. Forgot which one I was thinking of, maybe it was the old VITS stuff Rhaspy uses. Points stand, the voice was good but could barely understand what was said.