Remix.run Logo
H8crilA a day ago

Where to go next? I don't think anyone has gotten close to automating everyday PC usage, likely via screen capture and raw keyboard+mouse inputs. Imagine how much bigger would that market be than vibecoding.

wavemode a day ago | parent | next [-]

tbh I don't think this use case is going to be as big as people seem to think

there are a lot of reasons, but in brief - I think AI desktop use is a product that the average person isn't going to get much value out of. to make an analogy - the creators of Segway thought people would buy them in large numbers, but it turned out most people don't mind walking manually (or at least, don't mind it enough to spend money on a scooter). I think makers of AI Desktop Use products are going to find out the same thing as it relates to everyday tasks like checking email and shopping.

H8crilA a day ago | parent [-]

I was thinking more remotely managing the computer in a warehouse, replacing the mouse of an architect, or some physical object engineer. That your grandma can finally find Discord by speaking to such a bot is just a nice side effect.

wavemode a day ago | parent [-]

well yeah I wasn't even talking about professional use, since I think in professional use cases it will turn out make a lot more sense to set up APIs that AIs, use, than to set up screen scraping and mouse+keyboard use.

in fact even in rare cases where it's not possible to get an API or CLI to interface with some piece of software, I think people will find that their best bet is to first create a deterministic screen-scraping program for that specific software, then have that program serve an API for the AI to use. it would be so much cheaper to run (inference-wise) and so much more reliable, than having the AI itself perform the image interpretation and clicking.

I see AI desktop use as mainly a consumer product for that reason, since that's the situation where you have to react "on the fly" to whatever the user asks you to do and whatever program happens to be on their computer (versus professional cases which are more large-scale and repetitive, and where you can have a software developer on hand).

zozbot234 a day ago | parent | prev [-]

Automating GUI use is a silly idea when the AI can do much of the same things by getting access to a *nix command line - which is how all coding models work. It matters when driving proprietary apps or browsing websites that aren't providing a clean machine-readable API, not really otherwise.