| ▲ | kstrauser 21 hours ago |
| I love the insanity of this idea. Not saying it's a good idea, but it's a very highly entertaining one, and I like that! I've also had enormous luck with Anubis. AI scrapers found my personal Forgejo server and were hitting it on the order of 600K requests per day. After setting up Anubis, that dropped to about 100. Yes, some people are going to see an anime catgirl from time to time. Bummer. Reducing my fake traffic by a factor of 6,000 is worth it. |
|
| ▲ | anonymous908213 18 hours ago | parent | next [-] |
| As someone on the browsing end, I love Anubis. I've only seen it a couple of times, but it sparks joy. It's rather refreshing compared to Cloudfare, which will usually make me immediately close the page and not bother with whatever content was behind it. |
| |
| ▲ | teeray 14 hours ago | parent | next [-] | | It really reminds me of old Internet, when things were allowed to be fun. Not this tepid corporate-approved landscape we have now. | |
| ▲ | kstrauser 18 hours ago | parent | prev | next [-] | | Same here, really. That's why I started using it. I'd seen it pop up for a moment on a few sites I'd visited, and it was so quirky and completely not disruptive that I didn't mind routing my legit users through it. | | |
| ▲ | n1xis10t 18 hours ago | parent [-] | | So maybe there are more people who like the “anime catgirl” than there are who think it’s weird | | |
| ▲ | kstrauser 17 hours ago | parent [-] | | *anime jackalgirl ;-) Quite possibly. Or, in my case, I think it's more quirky and fun than weird. It's non-zero amounts of weird, sure, but far below my threshold of troublesome. I probably wouldn't put my business behind it. I'm A-OK with using it on personal and hobby projects. Frankly, anyone so delicate that they freak out at the utterly anodyne imagery is someone I don't want to deal with in my personal time. I can only abide so much pearl clutching when I'm not getting paid for it. | | |
|
| |
| ▲ | prmoustache 5 hours ago | parent | prev | next [-] | | Anyone is free to replace the cat girl with an actual cat or a vintage computer logo or whatnot anyway. My issue is that it blocks away people using browsers without javascript. | | |
| ▲ | stefanka 4 hours ago | parent [-] | | How can one do this? Did not find it in the docs | | |
| ▲ | easton 4 hours ago | parent [-] | | It’s a feature in the paid version, or I guess you could recompile it if you didn’t want to pay (but my guess is if you want to change the logo you can probably pay). | | |
|
| |
| ▲ | brettermeier 2 hours ago | parent | prev | next [-] | | Reminds me of weird furry porn, I can't say I like it | |
| ▲ | acheong08 7 hours ago | parent | prev | next [-] | | As someone on the hosting end, Anubis has unfortunately been overused and thus scrapers, especially Huawei ones, bypass it. I've gone for go-away instead which is similar but more configurable in challenges | |
| ▲ | PunchyHamster 8 hours ago | parent | prev | next [-] | | My experience with it is that it somehow took 20 seconds to load (site might've been hn-hugged at the time), only to "protect" some fucking static page instead of just serving that shit in the first place rather than wasting CPU on... whatever it was doing to cause delay | | |
| ▲ | timpera 5 hours ago | parent [-] | | Same experience for me. I tried it on a low-end smartphone and the Anubis challenge took about 45 seconds to complete. |
| |
| ▲ | m4rtink 8 hours ago | parent | prev [-] | | Yep, Anubis-chan is super cute! :) |
|
|
| ▲ | n1xis10t 20 hours ago | parent | prev | next [-] |
| That’s so many scrapers. There must be a ton of companies with very large document collections at this point, and it really sucks that they don’t at least do us the courtesy of indexing them and making them available for keyword search, but instead only do AI. It’s kind of crazy how much scraping goes on and how little search engine development goes on. I guess search engines aren’t fashionable. Reminds me of this article about search engines disappearing mysteriously: https://archive.org/details/search-timeline I try to share that article as much as possible, it’s interesting. |
| |
| ▲ | kstrauser 19 hours ago | parent | next [-] | | So! Much! Scraping! They were downloading every commit multiple times, and fetching every file as seen at each of those commits, and trying to download archives of all the code, and hitting `/me/my-repo/blame` endpoints as their IP's first-ever request to my server, and other unlikely stuff. My scraper dudes, it's a git repo. You can fetch the whole freaking thing if you wanna look at it. Of course, that would require work and context-aware processing on their end, and it's easier for them to shift the expense onto my little server and make me pay for their misbehavior. | | | |
| ▲ | PeterStuer 9 hours ago | parent | prev | next [-] | | Or some anti-ddos/bot companies using ultra cheap scraping services to annoy you enough to get you into their "free" anti bot protection, so they can charge the few real ai scrapers for access to your site. | | |
| ▲ | throw10920 3 hours ago | parent [-] | | Is there any evidence that this has actually happened? | | |
| ▲ | zhengyi13 38 minutes ago | parent [-] | | Even if there isn't (yet?), there's probably someone who's honestly thinking this is potentially a viable business model and at least napkin-mathing it out. |
|
| |
| ▲ | miki123211 10 hours ago | parent | prev | next [-] | | But there is a lot of search engine development going on, it's just that the results of the new search engines are fed straight into AI instead of displayed in the legacy 10-links-per-page view. | |
| ▲ | mrweasel 7 hours ago | parent | prev [-] | | > There must be a ton of companies with very large document collections at this point See, I don't think there is, I don't think they want that expense. It's basically the Linus Torvalds philosophy of data storage, if it's on the Internet, I don't need a backup. While I have absolutely no proof of this, I'd guess that many AI companies just crawl the Internet constantly, never saving any of the data. We're seeing some of these scrapers go to great length attempting to circumvent any and all forms of caching, they aren't interested in having a two week old copy of anything. | | |
| ▲ | kelvinjps10 an hour ago | parent | next [-] | | Where did Linus Torvalds expressed this philosophy I have never seen it | | | |
| ▲ | n1xis10t 2 hours ago | parent | prev [-] | | Could be. Can you train a model without saving things though? |
|
|
|
| ▲ | amypetrik8 an hour ago | parent | prev | next [-] |
| >I love the insanity of this idea. Not saying it's a good idea, but it's a very highly entertaining one, and I like that! An even more insane idea -- minding the idea here is porn is radioactive to AI data training scrapers -- is there is something the powers that be view as far more disruptive and against community guidelineish than porn. And that would be wrongthink. The narratives. The historic narratives. The woke ideology. Anything related to an academic department whose field is <population subgroup> studies. Alls you need to do is plop in a little diatribe staunchly opposing any such enforced views and that AI bot will shoot away from your website and lightspeed |
| |
| ▲ | lelanthran 39 minutes ago | parent [-] | | I like this better than of NSFW links; just include a (possible LLM generated) paragraph about not supporting transitions in minor children. Or perhaps that libraries that remove instructional booklets for how to have same-sex intercourse aren't actually banning the books. That sort of thing; nothing that 80% of people object to (so there's no problem if someone actually sees it), but something that definitely triggers the filters. |
|
|
| ▲ | n1xis10t 20 hours ago | parent | prev | next [-] |
| *anime jackalgirl Also you mentioned Anubis, so it’s creator will probably read this. Hi Xena! |
| |
| ▲ | xena 16 hours ago | parent | next [-] | | Ohai! I'm working on dataset poisoning. The early prototype generates vapid LinkedIn posts but future versions will be fully pluggable with WebAssembly. | | |
| ▲ | mrweasel 7 hours ago | parent | next [-] | | Now I'm picturing an AI trained exclusively on LinkedIn posts. One could probably sell that model to an online ad agency for a pretty penny. | | | |
| ▲ | tommica 12 hours ago | parent | prev | next [-] | | Hi Xena! Your blog is amazing! Didn't realize you're working on Anubis - it's a really nice tool for the internet! Reminds me a bit of the ye' olde internet for some reason. | |
| ▲ | gettingoverit 11 hours ago | parent | prev | next [-] | | You've made one of the best solutions, that matched what I thought of implementing myself, and at the time it was most needed. I think a couple of "thank you" are sorely missing in this comment section. Thank you! | |
| ▲ | n1xis10t 16 hours ago | parent | prev [-] | | That sounds fun, I look forward to reading a writeup about that | | |
| ▲ | xena 15 hours ago | parent [-] | | So I can plan it, how much detail do you want? Here's what I have about the prototype: https://anubis.techaro.lol/docs/admin/honeypot/overview | | |
| ▲ | 63stack 7 hours ago | parent | next [-] | | This is amazing, I was just wondering about if it's possible to tie anubis together with iocaine, but it seems you already thought of that. | | |
| ▲ | xena 5 hours ago | parent [-] | | It's slightly different in subtle ways. If I recall iocaine makes you configure a subprocess that it executes to generate garbage. One rule I have for Anubis in the code is that fork()/exec() are banned. So the pluggable garbage generator is gonna be powered by CGI handlers compiled to WebAssembly. It should be fun! |
| |
| ▲ | n1xis10t 15 hours ago | parent | prev | next [-] | | Probably any detail that you think is cool, I would be interested in reading about. When in doubt err on the side of too much detail. That was a good read. I hadn’t heard of spintax before, but I’ve thought of doing things like that. Also “pseudoprofound anti-content”, what a great term, that’s hilarious! | |
| ▲ | kstrauser 13 hours ago | parent | prev [-] | | As the owner of honeypot.net, I always appreciate seeing the name used as intended out in the wild. |
|
|
| |
| ▲ | ramonga 8 hours ago | parent | prev | next [-] | | what do people use to get keyword alerts in HN? | | |
| ▲ | n1xis10t 2 hours ago | parent [-] | | I think that most people don't do this, and the ones that do have custom solutions. Xena's uses cron, but that's all I know. It's probably a custom shell script. |
| |
| ▲ | kstrauser 20 hours ago | parent | prev | next [-] | | Correct; my bad! And hey, Xena! (And thank you very much!) | |
| ▲ | ziml77 19 hours ago | parent | prev | next [-] | | I checked Xe's profile when I hadn't seen them post here for a while. According to that, they're not really using HN anymore. | | | |
| ▲ | GaryBluto 6 hours ago | parent | prev [-] | | [dead] |
|
|
| ▲ | buu700 16 hours ago | parent | prev | next [-] |
| It's actually a well established concept: https://youtu.be/p9KeopXHcf8 |
|
| ▲ | tonymet 3 hours ago | parent | prev [-] |
| Funny how that also uses a porn cartoon |
| |
| ▲ | kstrauser 2 hours ago | parent [-] | | Which cartoon are you referring to? The version of Anubis I installed only has the G-rated default images. |
|