| ▲ | everfrustrated 3 hours ago | |
I think what gets lost in this is that we should expect a lot more traffic from AI if simply for the reason that if I ask AI to answer my question it will do a lot more work and fetch from a lot of websites in generating a reply to me. And yes searching over git repos will absolutely be part of that. This is all "legitimate" traffic in that it isn't about crawling the internet but in service of a real human. Put another way, search is moving from a model of crawl the internet and query on cached data to being able to query on live data. | ||
| ▲ | ethin 3 hours ago | parent | next [-] | |
I agree and I think that everyone agreeing or disagreeing with you (and sysadmins everywhere) would be perfectly fine with these AI crawlers (well, mostly...) if these corporations wrote them properly, followed best practices and standards, and didn't effectively DDoS servers or pretend to be what they aren't. Because that is, ultimately, what these AI companies are: very effective, for-sale, legal DDoSers. But they are not written properly, do not follow best practices and standards, and DDoS everything you aim them at, and even go as far as pretending that they're things they aren't, hide behind residential IP addresses (which I'm pretty sure could potentially be illegal because, you know, that risks getting people who have no idea what AI even is in trouble), etc. I don't think AI will replace search now just because so much of the world is blocked from them now, and that is only to increase I'm sure. And, honestly, I doubt there is anything these AI companies could do to make sysadmins actually trust them again anymore. | ||
| ▲ | hombre_fatal 3 hours ago | parent | prev [-] | |
In some ways that's true. But when it comes to git repos, an LLM agent like claude code can just clone them for local crawling which is far better than crawling remotely, and it's the "Right Way" for various reasons. Frankly I suspect AI agents will push search in the opposite direction from your comment and move us to distributed cache workflows. These tools just hit the origin because it's the easy solution of today, not because the data needs to be up to date to the millisecond. Imagine a system where all those Fetch(url) invocations interact with a local LRU cache. That'd be really nice, and I think that's where we'd want to go, especially once more and more origin servers try to block automated traffic. | ||