| ▲ | rohan_joshi 4 hours ago | |
small models struggle with prosody due to limited capacity. this version does much better than the precious one and is the best among other <25MB models. Kokoro is a really good model for its size, its competitive on artificial analysis too. i think by the next release we should have something kokoro quality but a fifth of the size. Adding control for rhythm seems to be quite important too, and we should start looking at that for other languages. | ||
| ▲ | magicalhippo 22 minutes ago | parent [-] | |
Listened to the video examples, sounded very good though wasn't terribly challenging text. If only I could have that in Norwegian my SO would be pleased. Also I totally misremembered regarding Kokoro TTS. It's good, but not what was butchering Norwegian. Forgot which one I was thinking of, maybe it was the old VITS stuff Rhaspy uses. Points stand, the voice was good but could barely understand what was said. | ||