| ▲ | ninjin 2 hours ago |
| I can report that Facebook does not respect robots.txt. Heck, I even mailed domain@fb.com with the specific IP ranges and log samples three times over a month and they of did not even respond. Keeps on wasting my CPU cycles to this day by crawling massive development forks (I hope they choke on the data...): $ (cat /var/www/logs/access.log; zcat /var/www/logs/access.log*.gz) | grep 2a03:2880: | wc -l
626396
About three hits per second for months now. |
|
| ▲ | dylan604 an hour ago | parent | next [-] |
| Can you serve them a specific file that would make it expensive on their end? |
| |
| ▲ | ninjin an hour ago | parent [-] | | If I had the time and energy, I would make some sort of simple code language model and generate infinite junk and feed that to them in the hope that it ruins their future training runs. But, I lack the former and some of the latter. Alternatively, maybe I would actually read one of those "backdoor papers" and try to inject something like that. | | |
| ▲ | dylan604 an hour ago | parent [-] | | I was wondering if this could be done without being malicious to that level. If they are costing you money, then I have no moral qualms playing in kind. Taking that next step would then give up the moral high ground and potentially introduce yourself to legally questionable grounds. I get the lack of time/energy for this type of thing. It is one of those projects that could be satisfying for yourself, but very hard to justify if you're a family person but something a younger person might get a lot of pleasure from. |
|
|
|
| ▲ | drcongo an hour ago | parent | prev [-] |
| I block their entire ASN when they do that. |