If an agent gets a copy of the screen using browser_screenshot and then wants to click somewhere on that screen, how is it meant to find the right css selector to pass to browser_click?

There's a browser_find method, but that assumes you already know what type of element it is. But I can't always tell what type of element something is just by looking at a screenshot.

What have I missed or misunderstood?

▲

coty 11 hours ago | parent [-]

For right now, the MCP server doesn’t expose quite enough to navigate on its own.

I’ve added a browser_evaluate tool in my fork—though I haven’t committed or pushed a PR yet. With that, the agent can call JavaScript to get the accessibility tree and then use that to navigate via browser_find.

This and much more will be coming soon. See the V2 roadmap for more insight: https://github.com/VibiumDev/vibium/blob/main/V2-ROADMAP.md

	▲	hugs 7 hours ago \| parent [-]
		one of the wild things about vibe coding is... i want to add that feature, but i'm slightly more interested in using the prompt/spec you might have used to create it, not the patch itself.