▲ | autoexec 2 days ago | |||||||
That's even worse because it would mean that it wasn't the scripted recording that failed, it means the AI itself sucks and can't tell that the bowl is empty and nothing was combined. Either this was the failure of a recorded demo that was faked to hide how bad the AI is, or it accurately demonstrated that the AI itself is a failure. Either way it's not a good look. | ||||||||
▲ | fragilerock 2 days ago | parent [-] | |||||||
My layperson interpretation of this particular error was that the AI model probably came up with the initial recipe response in full, but when the audio of that response was cut off because the user interrupted it, the model wasn't given any context of where it was interrupted so it didn't understand that the user hadn't heard the first part of the recipe. I assume the responses from that point onwards didn't take the video input into account, and the model just assumes the user has completed the first step based on the conversation history. I don't know how these 'live' ai sessions things work but based on the existing openai/gemini live ai chat products it seems to me most of the time the model will immediately comment on the video when the 'live' chat starts but for the rest of the conversation it works using TTS+STT unless the user asks the AI to consider the visual input. I guess if you have enough experience with these live AI sessions you can probably see why it's going wrong and steer it back in the right direction with more explicit instructions but that wouldn't look very slick in a developer keynote. I think in reality this feature could still be pretty useful as long as you aren't expecting it to be as smooth as talking to a real person | ||||||||
|