▲ | ahmedhawas123 3 days ago | |
This is cool though wanted to share a couple of thoughts for reflection: I feel like your demo video is not the greatest one to highlight the capability. A browsing use case likely does require a key press->planning loop, but a gaming use case, or a well known software (e.g., excel), may be able to think ahead 10-20 key presses before needing the next loop / verification. The current demo makes it seem slow / prototype-like. Also, the X/Y approach is interesting when thinking about a generic approach to screen management. But for example for browsers, you're likely adding overhead relative to just marking the specific div/buttons that are on screen and having those be part of the reasoning (e.g., "Click button X at div with path XX"). It may be helpful to think about the workflows you are going after and what kind of accelerated management you have over them. |