Remix clone Hacker News

new | show | ask | jobs Github

	▲	sosodev 3 days ago
		Huh, you're right. I tried your test and it clearly can't understand the difference between homophones. That seems to imply they're using some sort of TTS mechanism. Which is really weird because Qwen3-Omni claims to support direct audio input into the model. Maybe it's a cost saving measure?
	▲	sosodev 3 days ago \| parent \| next [-]
		Weirdly, I just tried it again and it seems to understand the difference between record and record just fine. Perhaps if there's heavy demand for voice chat, like after a new release, they load shed by using TTS to a smaller model. However, It still doesn't seem capable of producing any of the sounds, like laughter, that I would expect from a native voice model.
	▲	potatoman22 2 days ago \| parent \| prev [-]
		To be fair, discerning heteronyms might just be a gap in its training.