Remix clone Hacker News

new | show | ask | jobs Github

	▲	sosodev 3 days ago
		I think it's because they've crammed vision, audio, multiple voices, prosody control, multiple languages, etc into just 30 billion parameters. I think ChatGPT has the most lifelike speech with their voice models. They seem to have invested heavily in that area while other labs focused elsewhere.