| ▲ | echelon 5 hours ago | |
Speaking of browser automation, are there any LLMs or tools that hook up to actual desktop browsers and can automate the keyboard and mouse? Which LLMs best drive these? Claude/Gemini, etc., or is anything local actually competent at it? Can they understand layout and visual cues with a VLM or multimodality? Are they robust enough to interact with threejs and videos and whatnot, or can they just blindly navigate the DOM? | ||