Remix.run Logo
Jaxkr 17 hours ago

The author of this post could solve their problem with Cloudflare or any of its numerous competitors.

Cloudflare will even do it for free.

denkmoon 17 hours ago | parent | next [-]

Cool, I can take all my self hosted stuff and stick it behind centralised enterprise tech to solve a problem caused by enterprise tech. Why even bother?

FeteCommuniste 17 hours ago | parent [-]

"Cause a problem and then sell the solution" proves a winning business strategy once more.

Shorel 11 hours ago | parent | prev | next [-]

Cloudflare seems to be taking over all of the last mile web traffic, and this extreme centralization sounds really bad to me.

We should be able to achieve close to the same results with some configuration changes.

AWS / Azure / Cloudflare total centralization means no one will be able to self host anything, which is exactly the point of this post.

the_fall 17 hours ago | parent | prev | next [-]

They don't. I'm using Cloudflare and 90%+ of the traffic I'm getting are still broken scrapers, a lot of them coming through residential proxies. I don't know what they block, but they're not very good at that. Or, to be more fair: I think the scrapers have gotten really good at what they do because there's real money to be made.

esseph 14 hours ago | parent [-]

Probably more money in scraping than protection...

simonw 17 hours ago | parent | prev | next [-]

Cloudflare won't save you from this - see my comment here: https://news.ycombinator.com/item?id=46969751#46970522

Semaphor 16 hours ago | parent | prev | next [-]

For logging, statistics etc. we have the Cloudflare bot protection on the standard paid level, ignore all IPs not from Europe (rough geolocation), and still have over twice the amount of bots that we had ~2 years ago.

rubiquity 17 hours ago | parent | prev | next [-]

The scrapers should use some discretion. There are some rather obvious optimizations. Content that is not changing is less likely to change in the future.

JohnTHaller 17 hours ago | parent [-]

They don't care. It's the reason they ignore robots.txt and change up their useragents when you specifically block them.

overgard 17 hours ago | parent | prev | next [-]

I'm pretty sure scrapers aren't supposed to act as low key DOS attacks

isodev 17 hours ago | parent | prev | next [-]

I think the point of the post was how something useless (AI) and its poorly implemented scrapers is wrecking havoc in a way that’s turning the internet into a digital desert.

That Cloudflare is trying to monetise “protection from AI” is just another grift in the sense that they can’t help themselves as a corp.

fouc 17 hours ago | parent | prev [-]

you don't understand what self-hosting means. self-hosting means the site is still up when AWS and Cloudflare go down.