Remix.run Logo
observationist 10 hours ago

Nope, it's just a repackaging of the same problem, except in this case, the problem is solved with APIs and CLI and not jumping through hoops in order to get the AI to do what humans do.

It's about accomplishing a task, not making a bot accomplish a task using the same tools and embodiment context as a human - there's no upside, unless the bot is actually using a humanoid embodiment, and even then, using a CLI and service API is going to be preferable to doing things with UI in nearly every possible case, except where you want to limit to human-ish capabilities, like with gaming, or you want to deceive any monitors into thinking that a human is operating.

It's going to be infinitely easier to wrap a json get/push wrapper around existing APIs or automation interfaces than to universalize some sort of GUI interactions, because LLM's don't have the realtime memory you need to adapt to all the edge cases on the fly. It's incredibly difficult for humans, and hundreds of billions of dollars have been spent trying to make software universally accessible and dumbed down for users, and still ends up being either stupidly limited, or fractally complex in the tail, and no developer can ever account for all the possible ways in which users interact with a feature for any moderately complex piece of software.

Just use existing automation patterns. This is one case where if an AI picks up this capability alongside other advances, then awesome, but any sort of middleware is going to be a huge hack that immediately gets obsoleted by frontier models as a matter of course.