Remix.run Logo
drmath 5 hours ago

One source of trouble here is that the agent's view of the web page is so different from the human's. We could reduce the incidence of these problems by making them more similar.

Agents often have some DOM-to-markdown tool they use to read web pages. If you use the same tool (via a "reader mode") to view the web page, you'd be assured the thing you're telling the agent to read is the same thing you're reading. Cursor / Antigravity / etc. could have an integrated web browser to support this.

That would make what the human sees closer to what the agent sees. We could also go the other way by having the agent's web browsing tool return web page screenshots instead of DOM / HTML / Markdown.