| ▲ | rahimnathwani 11 hours ago | |||||||
If an agent gets a copy of the screen using browser_screenshot and then wants to click somewhere on that screen, how is it meant to find the right css selector to pass to browser_click? There's a browser_find method, but that assumes you already know what type of element it is. But I can't always tell what type of element something is just by looking at a screenshot. What have I missed or misunderstood? | ||||||||
| ▲ | coty 11 hours ago | parent [-] | |||||||
For right now, the MCP server doesn’t expose quite enough to navigate on its own. I’ve added a browser_evaluate tool in my fork—though I haven’t committed or pushed a PR yet. With that, the agent can call JavaScript to get the accessibility tree and then use that to navigate via browser_find. This and much more will be coming soon. See the V2 roadmap for more insight: https://github.com/VibiumDev/vibium/blob/main/V2-ROADMAP.md | ||||||||
| ||||||||