Remix.run Logo
daxfohl 3 days ago

Very cool. I've been thinking for a while that this is where things will end up. While custom AI integrations per service/product/whatever can be better and more efficient, there's always going to be stuff that doesn't have AI integrations but your workflow will need to use.

Without this, AI is going to be limited and kloodgy. Like if I wanted to have AI run a FEA simulation on some CAD model, I have to wait until the FEA software, the CAD software, the corporate models repo, etc., etc. all have AI integrations and then create some custom agent that glues them all together. Once AI can just control the computer effectively, then it can look up the instruction manuals for each of these pieces of software online, and then just have at it e2e like a human would. It can even ping you over slack if it gets stuck on something.

I think once stuff like this becomes possible, custom AI integrations will become less necessary. I'm sure they'll continue to exist for special cases, but the other nice thing about a generic computer-use agent is that you can record the stream and see exactly what it's doing, so a huge increase in observability. It can even demo to human workers how to do things because it works via the same interfaces.

kevingadd 3 days ago | parent [-]

One potential virtuous cycle here is that accessibility trees used by tools like screen readers are also a nice potential way for a model to consume information about what's on screen and how it can be interacted with. So it creates an additional incentive for improving the accessibility of new and existing software, because doing that lights up integration with future models.

alhirzel 3 days ago | parent [-]

This cycle starts with an integration for model developers. I wonder if anyone is working on a generic ARIA hookup, as well as whatever standards are necessary for desktop/smartphone integration?