Remix clone Hacker News

new | show | ask | jobs Github

	▲	BoxOfRain 3 days ago
		I've been experimenting with something similar to this approach recently. IndexTTS2 gives you emotion vectors as an input, I used an external emotion classification model on the LLM output to modulate the TTS emotion vectors. You need to manage the state of the current affect with a bit of care or it sounds unhinged, but it's worked surprisingly well so far. I wired it together using Cats Effect. As you'd expect latency isn't great, but I think it can be improved.