| ▲ | Lucasoato 3 hours ago | |||||||||||||||||||||||||
Wait a minute... I’m genuinely happy that they are sharing this, but keep in mind that realtime audio model from OpenAI are still stuck with the 4o family in terms of capabilities, sadly. I still find them so useful, such a pity that there’s no real competitor in this segment, having the experience a real conversation has helped me so much in expressing ideas and concepts. Still, it’s worth to keep in mind that these are not frontier models, differently from when they were released. (Please Sam, if you read this, release the new realtime audio models) | ||||||||||||||||||||||||||
| ▲ | modeless an hour ago | parent | next [-] | |||||||||||||||||||||||||
Grok voice is surprisingly good, actually. It's still a dumber model than the thinking modes of frontier models, but it's less dumb than the voice modes of other providers. | ||||||||||||||||||||||||||
| ▲ | dharma1 3 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||
Yes the voice part of OpenAI realtime/voice mode is great but it’s pretty dumb compared to newer models and often gets stuck repeating itself. Google’s Gemini flash live 3.1 is better, especially used via the API - it can do tool calling (including to other, even smarter LLMs if you set it up yourself), you can set the reasoning level (even high is still close enough to realtime) and it can ground answers in google search. I love bidirectional voice and right now it’s probably the best option. You can try it in AI studio | ||||||||||||||||||||||||||
| ||||||||||||||||||||||||||
| ▲ | artdigital an hour ago | parent | prev | next [-] | |||||||||||||||||||||||||
This is what makes their voice mode unusable to me. I can’t stand the way 4o replies and it’s such a big jump in quality from text mode | ||||||||||||||||||||||||||
| ▲ | ddp26 2 hours ago | parent | prev [-] | |||||||||||||||||||||||||
Yeah, the question in the title can be answered: "by using gpt-4o, a model 2 years behind the frontier, to serve audio responses" | ||||||||||||||||||||||||||