Remix.run Logo
dvt 5 hours ago

I’m working on a DOM agent and I think MCP is overkill. You have a few “layers” you can imply by just executing some simple JS (eg: visible text, clickable surfaces, forms, etc). 90% of the time, the agent can imply the full functionality, except for the obvious edge cases (which trip up even humans): infinite scrolling, hijacking navigation, etc.

Garlef 4 hours ago | parent | next [-]

Question: Are you writing this under the assumption that the proposed WebMCP is for navigating websites? If so: It is not. From what I've gathered, this is an alternative to providing an MCP server.

Instead of letting the agent call a server (MCP), the agent downloads javascript and executes it itself (WebMCP).

0x696C6961 5 hours ago | parent | prev | next [-]

In what world is this simpler than just giving the agent a list of functions it can call?

Mic92 4 hours ago | parent | next [-]

So usually MCP tool calls a sequential and therefore waste a lot of tokens. There is some research from Antrophic (I think there was also some blog post from cloudflare) on how code sandboxes are actually a more efficient interface for llm agents because they are really good at writing code and combining multiple "calls" into one piece of code. Another data point is that code is more deterministic and reliable so you reduce the hallucination of llms.

foota 4 hours ago | parent [-]

What do the calls being sequential have to do with tokens? Do you just mean that the LLM has to think everytime they get a response (as opposed to being able to compose them)?

zozbot234 4 hours ago | parent [-]

LLMs can use CLI interfaces to compose multiple tool calls, filter the outputs etc. instead of polluting their own context with a full response they know they won't care about. Command line access ends up being cleaner than the usual MCP-and-tool-calls workflow. It's not just Anthropic, the Moltbot folks found this to be the case too.

foota 4 hours ago | parent [-]

That makes sense! The only flaw here imo is that sometimes that thinking is useful. Sub-agents for tool calls imo make a nice sort of middle ground where they can both be flexible and save context. Maybe we need some tool call composing feature, a la io_uring :)

dvt 4 hours ago | parent | prev [-]

Who implements those functions? E.g., store.order has to have its logic somewhere.

Mic92 5 hours ago | parent | prev [-]

Do expose the accessibility tree of a website to llms? What do you do with websites that lack that? Some agents I saw use screenshots, but that seems also kind of wasteful. Something in-between would be interesting.

dvt 4 hours ago | parent [-]

I actually do use cross-platform accessibility shenanigans, but for websites this is rarely as good as just doing like two passes on the DOM, it even figures out hard stuff like Google search (where ids/classes are mangled).