Remix.run Logo
water-drummer 3 hours ago

Gemini live api and grok voice api can make tool calls and they're speech to speech models

d4rkp4ttern 2 hours ago | parent [-]

Right, turns out Claude and ChatGPT voice can also do web-search. So I guess behind the scenes there is more than a "pure" voice-voice model being used, i.e. there's probably a rudimentary agent loop with tools + tool-exec interposed.