| ▲ | Lasang 2 hours ago | ||||||||||||||||||||||
The idea of exposing a structured crawl endpoint feels like a natural evolution of robots.txt and sitemaps. If more sites provided explicit machine-readable entry points for crawlers, indexing could become a lot less wasteful. Right now crawlers spend a lot of effort rediscovering the same structure over and over. It also raises interesting questions about whether sites will eventually provide different views for humans vs. automated agents in a more formalized way. | |||||||||||||||||||||||
| ▲ | _heimdall 2 hours ago | parent | next [-] | ||||||||||||||||||||||
I expect that if we still used REST indexing would be even less wasteful. I've found myself falling pretty hard on the side of making APIs work for humans and expecting LLM providers to optimize around that. I don't need an MCP for a CLI tool, for example, I just need a good man page or `--help` documentation. | |||||||||||||||||||||||
| ▲ | pocksuppet 19 minutes ago | parent | prev | next [-] | ||||||||||||||||||||||
Apart from the obvious problem: presenting something different to crawlers and humans. | |||||||||||||||||||||||
| ▲ | catlifeonmars 2 hours ago | parent | prev | next [-] | ||||||||||||||||||||||
> It also raises interesting questions about whether sites will eventually provide different views for humans vs. automated agents in a more formalized way. This question raises an interesting question about if this would exacerbate supply chain injection attacks. Show the innocuous page to the human, another to the bot. | |||||||||||||||||||||||
| ▲ | pdntspa an hour ago | parent | prev | next [-] | ||||||||||||||||||||||
They already do... A lot of known crawlers will get a crawler-optimized version of the page | |||||||||||||||||||||||
| |||||||||||||||||||||||
| ▲ | rglover 2 hours ago | parent | prev [-] | ||||||||||||||||||||||
I just do a query param to toggle to markdown/text if ?llm=true on a route. Easy pattern that's opt-in. | |||||||||||||||||||||||