Remix.run Logo
Roark66 9 hours ago

I'm glad the author clarified he wants to prevent his instance from crashing not simply "block robots and allow humans".

I think the idea that you can block bots and allow humans is fallacious.

We should focus on a specific behaviour that causes problems (like making a bajillion requests one for each commit, instead of cloning the repo). To fix this we should block clients that work in such ways. If these bots learn to request at a reasonable pace why cares if they are bots, humans, bots under a control of an individual human, bots owned by a huge company scraping for training data? Once you make your code (or anything else) public, then trying to limit access to only a certain class of consumers is a waste of effort.

Also, perhaps I'm biased, because I run a searXNG and Crawl4AI (and few ancillaries like jina rerank etc) in my homelab so I can tell my AI to perform live internet searches as well as it can get any website. For code it has a way to clone stuff, but for things like issues, discussions, PRs it goes mostly to GitHub.

I like that my AI can browse almost like me. I think this is the future way to consume a lot of the web (except sites like this one that are an actual pleasure to use).

The models sometimes hit sites they can't fetch. For this I use Firecrawl. I use MCP proxy that lets me rewrite the tool descriptions so my models get access to both my local Crawl4ai and hosted (and rather expensive)firecrawl, but they are told to use Firecrawl as last resort.

The more people use these kinds of solutions the more incentive there will be for sites not to block users that use automation. Of course they will have to rely on alternative monetisation methods, but I think eventually these stupid capchas will disappear and reasonable rate limiting will prevail.

popcornricecake 7 hours ago | parent | next [-]

> I think this is the future way to consume a lot of the web

I think I see many prompt injections in your future. Like captchas with a special bypass solution just for AIs that leads to special content.

asfdasfsd 9 hours ago | parent | prev | next [-]

And people who block AI crawlers on moral grounds?

szundi 9 hours ago | parent | prev [-]

[dead]