No mention of tool use. If the model cannot emit both text and audio at the same time, to enable tools, it’s not really useful at all for voice agents.