Chatterbox-TTS has a MUCH MUCH better output quality though, the quality of the output from Sopro TTS (based on the video embedded on GitHub) is absolutely terrible and completely unusable for any serious application, while Chatterbox has incredible outputs.

I have an RTX5090, so not exactly what most consumers will have but still accessible, and it's also very fast, around 2 seconds of audio per 1 second of generation.

Here's an example I just generated (first try, 22 seconds runtime, 14 seconds of generation): https://jumpshare.com/s/Vl92l7Rm0IhiIk0jGors

Here's another one, 20 seconds of generation, 30 seconds of runtime, which clones a voice from a Youtuber (I don't use it for nefarious reasons, it's just for the demo): https://jumpshare.com/s/Y61duHpqvkmNfKr4hGFs with the original source for the voice: https://www.youtube.com/@ArbitorIan

▲

sammyyyyyyy 15 hours ago | parent | next [-]

You should try it! I wouldn’t say it’s the best, far from that. But also wouldn’t say it’s terrible. If you have a 5090, then yes, you can run much more powerful models in real time. Chatterbox is a great model though

▲

iLoveOncall 14 hours ago | parent [-]

> But also wouldn’t say it’s terrible.

But you included 3 samples on your GitHub video and they all sound extremely robotic and have very bad artifacts?

	▲	samuel-vitorino 14 hours ago \| parent [-]
		[dead]

▲

kkzz99 14 hours ago | parent | prev [-]

I've been using Higgs-Audio for a while now as the primary TTS system. How would you say does Chatterbox compare to it if you have experience?

	▲	iLoveOncall 14 hours ago \| parent [-]
		I haven't used it. I compared it with T5Gemma TTS that came out recently and Chatterbox is much better in all aspects, but especially in voice cloning where T5Gemma basically did not work.