AskUI could be a solution. It's also not just in browser, but the whole desktop: https://github.com/askui/vision-agent
Thanks! Looks promising!