| ▲ | hexaga 2 hours ago | |
Kokoro is fine tunable? Speaking as someone who went down the rabbit hole... it's really not. There's no (as of last time I checked) training code available so you need to reverse engineer everything. Beyond that the model is not good at doing voices outside the existing voicepacks: simply put, it isn't a foundation model trained on internet scale data. It is made from a relatively small set of focused, synthetic voice data. So, a very narrow distribution to work with. Going OOD immediately tanks perceptual quality. There's a bunch of inference stuff though, which is cool I guess. And it really is a quite nice little model in its niche. But let's not pretend there aren't huge tradeoffs in the design: synthetic data, phonemization, lack of train code, sharp boundary effects, etc. | ||