Remix clone Hacker News

new | show | ask | jobs Github

	▲	snorremd 4 hours ago
		I've recently been setting up web servers like Forgejo and Mattermost to service my own and friends' needs. I ended up setting up Crowdsec to parse and analyse access logs from Traefik to block bad actors that way. So when someone produces a bunch of 4XX codes in a short timeframe I assume that IP is malicious and can be banned for a couple of hours. Seems to deter a lot of random scraping. Doesn't stop well behaved crawlers though which should only produce 200-codes. I'm actually not sure how I would go about stopping AI crawlers that are reasonably well behaved considering they apparently don't identify themselves correctly and will ignore robots.txt.
	▲	lowdude 2 hours ago \| parent \| next [-]
		There was a comment in a different thread that suggested they may respect the robots.txt for the most part, but may ignore wildcards: https://news.ycombinator.com/item?id=46975726 Maybe this is worth trying out first, if you are currently having issues.
	▲	V__ 4 hours ago \| parent \| prev [-]
		If possible block I would block by country first. Even on public websites I block Russia/China by default and that reduced port scans etc. On "private" services where I or my friends are the only users, I block everything except my country.