| ▲ | iLoveOncall 15 hours ago | ||||||||||||||||
Chatterbox-TTS has a MUCH MUCH better output quality though, the quality of the output from Sopro TTS (based on the video embedded on GitHub) is absolutely terrible and completely unusable for any serious application, while Chatterbox has incredible outputs. I have an RTX5090, so not exactly what most consumers will have but still accessible, and it's also very fast, around 2 seconds of audio per 1 second of generation. Here's an example I just generated (first try, 22 seconds runtime, 14 seconds of generation): https://jumpshare.com/s/Vl92l7Rm0IhiIk0jGors Here's another one, 20 seconds of generation, 30 seconds of runtime, which clones a voice from a Youtuber (I don't use it for nefarious reasons, it's just for the demo): https://jumpshare.com/s/Y61duHpqvkmNfKr4hGFs with the original source for the voice: https://www.youtube.com/@ArbitorIan | |||||||||||||||||
| ▲ | sammyyyyyyy 15 hours ago | parent | next [-] | ||||||||||||||||
You should try it! I wouldn’t say it’s the best, far from that. But also wouldn’t say it’s terrible. If you have a 5090, then yes, you can run much more powerful models in real time. Chatterbox is a great model though | |||||||||||||||||
| |||||||||||||||||
| ▲ | kkzz99 14 hours ago | parent | prev [-] | ||||||||||||||||
I've been using Higgs-Audio for a while now as the primary TTS system. How would you say does Chatterbox compare to it if you have experience? | |||||||||||||||||
| |||||||||||||||||