Remix.run Logo
ronsor 4 hours ago

robots.txt is for automated, headless crawlers, NOT user-initiated actions. If a human directly triggers the action, then robots.txt should not be followed.

hyperhopper 4 hours ago | parent [-]

But what action are you triggering that automatically follows invisible links? Especially those not meant to be followed with text saying not to follow them.

This is not banning you for following <h1><a>Today's Weather</a></h1>

If you are a robot that's so poorly coded that it is following links it clearly shouldn't that's are explicitly numerated as not to be followed, that's a problem. From an operator's perspective, how is this different than a case you described.

If a googler kicked off the googlebot manually from a session every morning, should they not respect robots.txt either?

varenc 4 hours ago | parent [-]

I was responding to someone earlier saying a user agent should respect robots.txt. An LLM powered user-agent wouldn't follow links, invisible or not, because it's not crawling.

hyperhopper 3 hours ago | parent [-]

It very feasibly could. If I made an LLM agent that clicks on a returned element, and then the element was this trap doored link, that would happen