| ▲ | why_at 6 hours ago | ||||||||||||||||||||||||||||
My first impression coming away from this is skepticism. Anything with voice controls for routine use is a pretty tough sell. Doing this when you're not completely alone would be annoying to everyone around you. Most of their examples seem like they could have been done with a right click drop down menu so they don't really need to "re-invent the mouse pointer". So is this thing talking to Google's servers all the time for the AI integration? So it won't work if you're not connected to the internet? Privacy concerns are obvious; now Google wants to have an AI watching literally everything you do on your computer? Does it cost the user anything for the LLM use? If it's free will it stay free forever? That's quite a lot to give away if they're expecting people to use it to change a single word like in one of their examples. I guess they're expecting to make the money back by gathering data about literally everything you do on your computer. There might be a killer app for AI integration with personal computers that has yet to be invented, but this doesn't look like it. | |||||||||||||||||||||||||||||
| ▲ | fny 14 minutes ago | parent | next [-] | ||||||||||||||||||||||||||||
It's possible to rely on mouth movements instead of sound. I've been tweaking visual speech recognition models (VSR) for the past few weeks so that I can "talk" to my agents at the office without pissing everyone off. It works okay. Limiting language to "move this" "clear that" along side context cues vastly simplifies the problem and makes it far more possible on device. I think its brilliant UX. | |||||||||||||||||||||||||||||
| ▲ | concinds 3 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||
The second half of your comment is a go-to-market concern but doesn't feel so relevant for a research prototype. It could be done with a private local model too, maybe not by Google. But I don't think the voice problem is surmountable. I closed their image editing demo when I saw it required a mic. It would be appealing as a Spotlight-like text pop-up interface where you type instructions, which would work in social/office environments, but that might only appeal to power users. | |||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||
| ▲ | nolist_policy 5 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||
The "Edit an Image" Demo at the bottom is pretty fun. Maybe this is just Google flexing their LLM inference capacity. | |||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||
| ▲ | AirMax98 5 hours ago | parent | prev [-] | ||||||||||||||||||||||||||||
Right — it does seem cool but the voice is patching over a major gap. If I'm talking already, why wouldn't I just describe what I'm looking at and have the AI grab it for me? | |||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||