Remix.run Logo
dharma1 3 hours ago

Yes the voice part of OpenAI realtime/voice mode is great but it’s pretty dumb compared to newer models and often gets stuck repeating itself.

Google’s Gemini flash live 3.1 is better, especially used via the API - it can do tool calling (including to other, even smarter LLMs if you set it up yourself), you can set the reasoning level (even high is still close enough to realtime) and it can ground answers in google search. I love bidirectional voice and right now it’s probably the best option. You can try it in AI studio

Lucasoato 3 hours ago | parent [-]

Thanks, I’ll try it, even if my experience wasn’t that great with Google models lately (503s)

dharma1 2 hours ago | parent [-]

Give it a shot, 3.1 live one in AI studio/API and max out reasoning - not the one in Gemini app it’s an older model.

Another option is to use pipecat with their VAD and separate STT and TTS and any (fast) LLM of your choice - but it’s more plumbing and not a true speech to speech model

stavros an hour ago | parent [-]

Haha, wow, I never thought I'd see a voice model that was too quick, but 3.1 live felt like it responded unnaturally quickly! I'm kind of blown away, I'd want to insert a 100ms delay to make it sound more natural, wow. I never thought I'd see that.