Remix.run Logo
echelon 5 hours ago

Speaking of browser automation, are there any LLMs or tools that hook up to actual desktop browsers and can automate the keyboard and mouse?

Which LLMs best drive these? Claude/Gemini, etc., or is anything local actually competent at it?

Can they understand layout and visual cues with a VLM or multimodality?

Are they robust enough to interact with threejs and videos and whatnot, or can they just blindly navigate the DOM?