| ▲ | potatoman22 3 days ago | ||||||||||||||||||||||
From what I can tell, their official chat site doesn't have a native audio -> audio model yet. I like to test this through homophones (e.g. record and record) and asking it to change its pitch or produce sounds. | |||||||||||||||||||||||
| ▲ | dragonwriter 3 days ago | parent | next [-] | ||||||||||||||||||||||
“record and record”, if you mean the verb for persisting something and the noun for the thing persisted, are heteronyms (homographs which are not homophones), which incidentally is also what you would probably want to test what you are talking about here (distinguishing homophones would test use of context to understand meaning, but wouldn’t test anything about whether or not logic was working directly on audio or only working on text processed from audio, failing to distinguish heteronyms is suggestive of processing occurring on text, not audio directly.) | |||||||||||||||||||||||
| |||||||||||||||||||||||
| ▲ | sosodev 3 days ago | parent | prev | next [-] | ||||||||||||||||||||||
Huh, you're right. I tried your test and it clearly can't understand the difference between homophones. That seems to imply they're using some sort of TTS mechanism. Which is really weird because Qwen3-Omni claims to support direct audio input into the model. Maybe it's a cost saving measure? | |||||||||||||||||||||||
| |||||||||||||||||||||||
| ▲ | djtango 3 days ago | parent | prev [-] | ||||||||||||||||||||||
Is record a homophone? At least in the UK we use different pronunciations for the meanings. Re-cord for the verb, rec-ord for the noun. | |||||||||||||||||||||||
| |||||||||||||||||||||||