Remix clone Hacker News

new | show | ask | jobs Github

	▲	gcr a day ago
		Chatterbox TTS does this in “voice cloning” mode but you have to implement the streaming part yourself. There are two inputs: audio A (“style”) and B (“content”). The timbre is taken from A, and the content, pronunciation, prosody, accent, etc is taken from B. Strictly soeaking, voice cloning models like this and chatterbox are not “TTS” - they’re better thought of as “S+STS”, that is, speech+style to speech