| ▲ | johnklos 6 days ago |
| This is a usually technical crowd, so I can't help but wonder if many people genuinely don't get it, or if they are just feigning a lack of understanding to be dismissive of Anubis. Sure, the people who make the AI scraper bots are going to figure out how to actually do the work. The point is that they hadn't, and this worked for quite a while. As the botmakers circumvent, new methods of proof-of-notbot will be made available. It's really as simple as that. If a new method comes out and your site is safe for a month or two, great! That's better than dealing with fifty requests a second, wondering if you can block whole netblocks, and if so, which. This is like those simple things on submission forms that ask you what 7 + 2 is. Of course everyone knows that a crawler can calculate that! But it takes a human some time and work to tell the crawler HOW. |
|
| ▲ | palata 6 days ago | parent | next [-] |
| > they are just feigning a lack of understanding to be dismissive of Anubis. I actually find the featured article very interesting. It doesn't feel dismissive of Anubis, but rather it questions whether this particular solution makes sense or not in a constructive way. |
| |
| ▲ | johnklos 6 days ago | parent [-] | | I agree - the article is interesting and not dismissive. I was talking more about some of the people here ;) | | |
| ▲ | dmesg 6 days ago | parent [-] | | I still don't understand what Anubis solves if it can be bypassed too easily: If you use User-agent switcher (i emulate wget) as firefox addon on kernel.org or ffmpeg.org you save the entire check time and straight up skip Anubis. Apparently they use a whitelist for user-agents due to allowing legitimate wget usage on these domains. However if I (an honest human can) the scrapers and grifters can too. https://addons.mozilla.org/en-US/firefox/addon/uaswitcher/ If anyone wants to try themselves. This is by no means against Anubis, but raising the question: Can you even protect a domain if you force yourself to whitelist (for a full bypass) easy to guess UAs? | | |
| ▲ | hooverd 5 days ago | parent [-] | | It's extra work for scrapers. They pretend to be upstanding citizens (Chrome UA from residential IPs). You can more easily block those. | | |
| ▲ | 20after4 5 days ago | parent [-] | | A lot of scrapers are actually utilizing some malware installed on residential user's machines, so the request is legitimately coming from a chrome UA on a residential ip. | | |
|
|
|
|
|
| ▲ | technion 6 days ago | parent | prev | next [-] |
| It really should be recognised just how many people are watching Cloudflare interstitials on nearly every site these days (and I totally get why this happens) yet making a huge amount of noise about Anubis on a very small amount of sites. |
| |
| ▲ | mlyle 6 days ago | parent | next [-] | | I don't trip over CloudFlare except when in a weird VPN, and then it always gets out of my way after the challenge. Anubis screws with me a lot, and often doesn't work. | | |
| ▲ | dijit 6 days ago | parent | next [-] | | The annoying thing about cloudflare is that most of the time once you’re blocked: you’re blocked. There’s literally no way for you to bypass the block if you’re affected. Its incredibly scary, I once had a bad useragent (without knowing it) and half the internet went offline, I couldn’t even access documentation or my email providers site, and there was no contact information or debugging information to help me resolve it: just a big middle finger for half the internet. I haven’t had issues with any sites using Anubis (yet), but I suspect there are ways to verify that you’re a human if your browser fails the automatic check at least. | | |
| ▲ | zorked 6 days ago | parent | next [-] | | CloudFlare is dystopic. It centralizes even the part of the Internet that hadn't been centralized before. It is a perfect Trojan horse to bypass all encryption. And it chooses who accesses (a considerable chunk of) the Internet and who doesn't. Anubis looks much better than this. | | |
| ▲ | jijijijij 6 days ago | parent | next [-] | | It's literally insane. After Snowden, how the fuck did we ended up with a single US company terminating almost every TLS connection? | | |
| ▲ | jgalt212 2 days ago | parent [-] | | They don't have to terminate every TLS connection. That's just the happy path. |
| |
| ▲ | robertlagrant 6 days ago | parent | prev [-] | | > It is a perfect Trojan horse to bypass all encryption Isn't any hosting provider also this? | | |
| ▲ | dijit 6 days ago | parent [-] | | Not necessarily. FaaS: Yes. IaaS: Only if you do TLS termination at their gateway, otherwise not really, they'd need to get into your operating system to get the keys which might not always be easy. They could theoretically MITM the KVM terminal when you put in your disk decryption keys but that seems unlikely. |
|
| |
| ▲ | piltdownman 6 days ago | parent | prev | next [-] | | It could be a lot worse. Soccer rights-holders effectively shut-down the Cloudflare facilitated Internet in Spain during soccer matches to 'curb piracy'. The Soccer rightsholders - LaLiga - claim more than 50% of pirate IPs illegally distributing its content are protected by Cloudflare. Many were using an application called DuckVision to facilitate this streaming. Telefónica, the ISP, upon realizing they couldn’t directly block DuckVision’s IP or identify its users, decided on a drastic solution: blocking entire IP ranges belonging to Cloudflare, which continues to affect a huge number of services that had nothing to do with soccer piracy. https://pabloyglesias.medium.com/telef%C3%B3nicas-cloudflare... https://www.broadbandtvnews.com/2025/02/19/cloudflare-takes-... https://community.cloudflare.com/t/spain-providers-blocks-cl... | |
| ▲ | ehnto 6 days ago | parent | prev | next [-] | | Now imagine your government provided internet agent gets blacklisted because your linked social media post was interpreted by an LLM to be anti-establishment, and we are painting a picture of our current trajectory. | | |
| ▲ | petralithic 6 days ago | parent | next [-] | | I don't have to imagine | |
| ▲ | thrown-0825 3 days ago | parent | prev [-] | | most of the people on this site work or worked for companies who enabled this or specifically sold it as a feature we are all complicit |
| |
| ▲ | kedihacker 6 days ago | parent | prev | next [-] | | Anubis checks proof of work so as long as JavaScript runs you will pass it. | |
| ▲ | Dilettante_ 6 days ago | parent | prev [-] | | A "digital no-fly-list" is hella cyberpunk, though. | | |
| ▲ | ehnto 6 days ago | parent [-] | | The question might become, what side of the black wall are you going to be on? Seriously though I do think we are going to see increasing interest in alternative nets, especially as governments tighten their control over the internet or even break away into isolated nation nets. | | |
| ▲ | pjc50 6 days ago | parent [-] | | Paradoxically, the problem with an "alternative net" (which could be tunneled over the regular one) is keeping it alternative. It has to be kept small and un-influential in order to stay under the radar. If you end up with an "alternative" which is used by journalists and politicians, you've just reinvented the mainstream, and you're no longer safe from being hit by a policy response. Think private trackers. The opposite of 4chan, which is an "alternative" that got too influential in setting the tone of the rest of the internet. | | |
| ▲ | dijit 6 days ago | parent [-] | | Not necessarily, Yggdrasil flies under the radar because it's inherently hard to block. Tor even more so, the power of Tor is that the more people use it: the stronger it becomes to centralised adversaries. The main issue with Tor is the performance of it though. | | |
| ▲ | thfuran 5 days ago | parent | next [-] | | I thought that the main issue with tor was that so many of the exit nodes are actually the FBI. | | |
| ▲ | dijit 5 days ago | parent [-] | | You don't ever have to leave the Tor network. I host IRC on a hidden service, and even Facebook (lol) offers a hidden service endpoint. All that is needed is for a critical mass of people and a decent index: and we successfully have reinvented "the wired" from Serial Experiments: Lain |
| |
| ▲ | flkenosad 6 days ago | parent | prev [-] | | The truth is the internet was never designed or intended to host private information. It was created for scientists by scientists to share research papers. Capitalists perverted it. |
|
|
|
|
| |
| ▲ | binaryturtle 6 days ago | parent | prev | next [-] | | I'm on an older system here, and both Cloudflare and Anubis entirely block me out of sites. Once you start blocking actual users out of your sites, it simply has gone too far. At least provide an alternative method to enter your site (e.g. via login) that's not hampered by erroneous human checks. Same for the captchas where you help train AIs by choosing out of a set of tiny/ noisy pictures. I often struggle for 5 to 10 minutes to get past that nonsense. I heard bots have less trouble. Basically we're already past the point where the web is made for actual humans, now it's made for bots. | | |
| ▲ | inejge 6 days ago | parent | next [-] | | > Once you start blocking actual users out of your sites, it simply has gone too far. It has, scrapers are out of control. Anubis and its ilk are a desperate measure, and some fallout is expected. And you don't get to dictate how a non-commercial site tries to avoid throttling and/or bandwidth overage bills. | | |
| ▲ | account42 6 days ago | parent [-] | | No, they are a lazy measure. Most websites that slap on these kinds of checks don't even bother with more human-friendly measures first. | | |
| ▲ | mschuster91 6 days ago | parent [-] | | Because I don't have the fucking time to deal with AI scraper bots. I went harder - anything even looking suspiciously close to a scraper that's not on Google's index [1] or has wget in its user agent gets their entire /24 hard banned for a month, with an email address to contact for unbanning. That seems to be a pretty effective way for now to keep scrapers, spammers and other abusive behavior away. Normal users don't do certain site actions at the speed that scraper bots do, there's no other practically relevant search engine than Google, I've never ever seen an abusive bot hide as wget (they all try to emulate looking like a human operated web browser), and no AI agent yet is smart enough to figure out how to interpret the message "Your ISP's network appears to have been used by bot activity. Please write an email to xxx@yyy.zzz with <ABC> as the subject line (or click on this pre-filled link) and you will automatically get unblocked". [1] https://developers.google.com/search/docs/crawling-indexing/... | | |
| ▲ | account42 6 days ago | parent [-] | | > Normal users don't do certain site actions at the speed that scraper bots do How would you know when you have already banned them. | | |
| ▲ | mschuster91 6 days ago | parent [-] | | Simple. A honeypot link in a three levels deep menu which no ordinary human would care about that, thanks to a JS animation, needs at least half a second for a human to click on. Any bot that clicks it in less than half a second gets the banhammer. No need for invasive tracking, third party integrations, whatever. | | |
| ▲ | account42 6 days ago | parent [-] | | That does sound like a much human friendlier approach than Anubis. I agree that tarpits and honeypots are a good stopgap until the legal system catches up to the rampant abuse of these "AI" companies. It's when your solutions start affecting real human users just because they are not "normal" in some way that I stop being sympathetic. |
|
|
|
|
| |
| ▲ | alperakgun 6 days ago | parent | prev | next [-] | | I gave up on a lot of websites because of the aggressive blocking. | |
| ▲ | johnklos 6 days ago | parent | prev [-] | | FYI - you can communicate with the author of Anubis, who has already said she's working on ways to make sure that all browsers - links, lynx, dillo, midori, et cetera, work. Unless you're paying Cloudflare a LOT of money, you won't get to talk with anyone who can or will do anything about issues. They know about their issues and simply don't care. If you don't mind taking a few minutes, perhaps put some details about your setup in a bug report? |
| |
| ▲ | necovek 6 days ago | parent | prev | next [-] | | It's the other way around for me sometimes — I've never had issue with Anubis, I frequently get it with CF-protected sites. (Not to mention all the sites which started putting country restrictions in on their generally useful instruction articles etc — argh) | |
| ▲ | Pinus 6 days ago | parent | prev | next [-] | | I’m planning a trip to France right now, and it seems like half the websites in that country (for example, ratp.fr for Paris public transport info) require me to check a CloudFlare checkbox to promise that I am a human. And of those that don’t, quite a few just plain lock me out... | | |
| ▲ | ta988 6 days ago | parent | next [-] | | And a lot of US sites don't work in France either, or they ban you after just a couple requests with no appeal... | |
| ▲ | Symbiote 6 days ago | parent | prev | next [-] | | I find the same when using some foreign sites. I think the operator must have configured that France is OK, maybe neighboring countries too, the rest of the world must be checked. | |
| ▲ | alibarber 6 days ago | parent | prev [-] | | It's not hard to understand why though surely? You might have to show a passport when you enter France, and have your baggage and person (intrusively) scanned if you fly there, for much the same reason. People, some of them in positions of government in some nation states want to cause harm to the services of other states. Cloudflare was probably the easiest tradeoff for balancing security of the service with accessibility and cost to the French/Parisian taxpayer. Not that I'm happy about any of this, but I can understand it. | | |
| ▲ | inferiorhuman 6 days ago | parent [-] | | The antagonists in this case are not state sponsored terrorists, instead it's AI bros DDoSing the internet. |
|
| |
| ▲ | thayne 6 days ago | parent | prev | next [-] | | I get one basically every time I go to gitlab.com on Firefox. It is easy to pass the challange, but it isn't any better than Anubis. | |
| ▲ | NoGravitas 6 days ago | parent | prev | next [-] | | Even when not on VPN, if a site uses the CloudFlare interstitials, I will get it every single time - at least the "prove you're not a bot" checkbox. I get the full CAPTCHA if I'm on a VPN or I change browsers. It is certainly enough to annoy me. More than Anubis, though I do think Anubis is also annoying, mainly because of being nearly worthless. | |
| ▲ | wongarsu 6 days ago | parent | prev | next [-] | | For me both are things that mostly show up for 1-3 seconds, then get replaced by the actual website. I suspect that's the user experience of 99% of people. If you fall in the other 1% (e.g. due to using unusual browsers or specific IP ranges), cloudflare tends to be much worse | |
| ▲ | immibis 6 days ago | parent | prev [-] | | You must be on a good network. You should run one of those "get paid to share your internet connection with AI companies" apps. Since you're on a good network you might make a lot of money. And then your network will get cloudflared, of course. We should repeat this until every network is cloudflared and everyone hates cloudflare and cloudflare loses all its customers and goes bankrupt. The internet would be better for it. |
| |
| ▲ | elric 6 days ago | parent | prev | next [-] | | I hit Cloudflare's garbage about as much as I hit Anubis. With the difference that far more sites use Cloudflare than Anubis, thus Anubis is far worse at triggering false positives. | | |
| ▲ | Aachen 6 days ago | parent | next [-] | | Huh? What false positives does Anubis produce? The article doesn't say and I constantly get the most difficult Google captchas, cloudflare block pages saying "having trouble?" (which is a link to submit a ticket that seems to land in /dev/null), IP blocks because user agent spoofing, errors "unsupported browser" when I don't do user agent spoofing... the only anti-bot thing that reliably works on all my clients is Anubis. I'm really wondering what kinds of false positives you think Anubis has, since (as far as I can tell) it's a completely open and deterministic algorithm that just lets you in if you solve the challenge, and as the author of the article demonstrated with some C code (if you don't want to run the included JavaScript that does it for you), that works even if you are a bot. And afaik that's the point: no heuristics and false positives but a straight game of costs; making bad scraping behavior simply cost more than implementing caching correctly or using commoncrawl | | |
| ▲ | jakogut 6 days ago | parent [-] | | I've had Anubis repeatedly fail to authorize me to access numerous open source projects, including the mesa3d gitlab, with a message looking something like "you failed". As a legitimate open source developer and contributor to buildroot, I've had no recourse besides trying other browsers, networks, and machines, and it's triggered on several combinations. | | |
| ▲ | stock_toaster 5 days ago | parent | next [-] | | It sounds[1] like this was an issue with assumptions regarding header stability. Hopefully as people update their installations things will improve for us end users. [1]: https://anubis.techaro.lol/blog/release/v1.20.0/#chrome-wont... | | |
| ▲ | jakogut 5 days ago | parent [-] | | Thank goodness. It was feeling quite dystopian being caught in a bot dragnet that blocked me from resources that are relevant and vital to my work. |
| |
| ▲ | Aachen 6 days ago | parent | prev [-] | | Interesting, I didn't even know it had such a failure mode. Thanks for the reply, I'll sadly have to update my opinion on this project since it's apparently not a pure "everyone is equal if they can Prove the Work" system as I thought :( I'm curious how, though, since the submitted article doesn't mention that and demonstrates curl working (which is about as low as you can go on the browser emulation front), but no time to look into it atm. Maybe it's because of an option or module that the author didn't have enabled |
|
| |
| ▲ | analbliss 6 days ago | parent | prev [-] | | So yes, it is like having a stalker politely open the door for you as you walk into a shop, because they know very well who you are. | | |
| ▲ | robertlagrant 6 days ago | parent | next [-] | | In a world full of robots that look like humans, the stalker who knows you and lets you in might be the only solution. | | |
| ▲ | Aachen 6 days ago | parent | next [-] | | That's called authentication. In the case of the stalker, by biometrics (facial recognition). This could be a solution But that's not what Cloudflare does. Cloudflare guesses whether you are a bot and then either blocks you or not. If it currently likes you, bless your luck | | |
| ▲ | KETHERCORTEX 5 days ago | parent [-] | | > This could be a solution Until the moment someone will figure out the generation of realistic enough 3d faces. | | |
| ▲ | Aachen 4 days ago | parent [-] | | Ah true! I meant authentication in general by whatever means, which seems dystopian enough already, but indeed my post can be read as being about facial recognition being required to visit random websites... that's even worse! Don't give them ideas xD |
|
| |
| ▲ | petralithic 6 days ago | parent | prev [-] | | That stalker might itself be a bot though, so there's no solution. |
| |
| ▲ | rob_c 6 days ago | parent | prev [-] | | [flagged] | | |
|
| |
| ▲ | tgv 6 days ago | parent | prev | next [-] | | That says something about the chosen picture, doesn't it? Probably that it's not well liked. It certainly isn't neutral, while the Cloudfare page is. | | |
| ▲ | drakythe 6 days ago | parent | next [-] | | You know, you say that, and while I understand where you're coming from I was browsing the git repo when github had a slight error and I was greeted with an angry pink unicorn. If Github can be fun like that, Anubis can too, I think. | | |
| ▲ | MintPaw 6 days ago | parent | next [-] | | Yeah, but do people like that? It feels pretty patronizing to me in a similar way. Like "Weee! So cute that our website is broken, good luck doing your job! <3" Reminds me of the old uwu error message meme. | | |
| ▲ | jijijijij 6 days ago | parent [-] | | > patronizing I think it's reasonable and fair, and something you are expected to tolerate in a free world. In fact, I think it's rather unusual to take this benign and inconsequential thing as personal as you do. | | |
| ▲ | lsh0 5 days ago | parent [-] | | Not at all. I can't stand it either. It's definitely patronising and infantile. I tolerate the silliness, grit my teeth and move on but it wears away at my patience. | | |
| ▲ | korse 4 days ago | parent [-] | | This is why I add anime catgirls to nearly everything I build. I'm glad the effort isn't in vain! |
|
|
| |
| ▲ | 6 days ago | parent | prev | next [-] | | [deleted] | |
| ▲ | tgv 6 days ago | parent | prev [-] | | I don't think you want to suggest that everyone must like it? |
| |
| ▲ | thrance 6 days ago | parent | prev | next [-] | | Anubis was originally an open source project built for a personnal blog. It gained traction but the anime girl remained so that people are reminded of the nature of the project. Comparing it with Cloudflare is truly absurd. That said, a paid version is available with guard page customization. | |
| ▲ | troyvit 6 days ago | parent | prev [-] | | Nothing says, "Change out the logo for something that doesn't make my clients tingle in an uncomfortable way" like the MIT license. | | |
| ▲ | integralid 6 days ago | parent | next [-] | | I wonder why the anime girl is received so badly. Is it because it's seen as childish? Is it bad because it confuses people (i.e. don't do this because other don't do this)? Thinking about it logically, putting some "serious" banner there would just make everything a bit more grey and boring and would make no functional difference. So why is it disliked so much? | | |
| ▲ | deanishe 5 days ago | parent | next [-] | | The GitHub unicorn doesn't look as if it came out of a furry dev's wank bank. | | |
| ▲ | troyvit 4 days ago | parent [-] | | Who are you to judge what's a wank bank and what's not? And what wank bank do you go to? The logo doesn't even have breasts. |
| |
| ▲ | ericpp 5 days ago | parent | prev | next [-] | | I'm glad that they kept the anime girl rather than replacing her with a sterile message. The Internet should be a fun place again. | |
| ▲ | 20after4 5 days ago | parent | prev | next [-] | | Because the world is full of haters? I personally find anime kind of cringe but that's just a matter of taste. | |
| ▲ | tgv 5 days ago | parent | prev [-] | | Why? It has sexual connotations, and it involves someone under the age of consent. As wikipedia puts it: "In a 2010 critique of the manga series Loveless, the feminist writer T. A. Noonan argued that, in Japanese culture, catgirl characteristics have a similar role to that of the Playboy Bunny in western culture, serving as a fetishization of youthful innocence." > Thinking about it logically This isn't about logic. | | |
| ▲ | troyvit 4 days ago | parent [-] | | > This isn't about logic. Clearly you proved that. What has sexual connotations is wildly subjective and plucking the opinion of one author/poet's critique from 15 years ago doesn't make it fact today. | | |
| ▲ | tgv 4 days ago | parent [-] | | It's about perception and feelings. If anime cat girls have sexual connotations (for a large enough group), that's the way it is. That critique didn't come out of thin air, and its age is hardly relevant. The association has been established. If you use a symbol that has a certain association, you shouldn't be surprised if people react to that association when they encounter that symbol. There's nothing wrong with "subjective", by the way. You seem to think it discredits something (can't say what exactly), but this topic is subjective. It's not about logic (as if anything outside maths ever is). |
|
|
| |
| ▲ | notpushkin 6 days ago | parent | prev [-] | | Keep in mind that the author explicitly asks you not to do this, and offers a paid white label version. You can still do it yourself, but maybe you shouldn’t. | | |
|
| |
| ▲ | jcelerier 6 days ago | parent | prev | next [-] | | Both are equally terrible - one doesn't require explanations to my boss though | | |
| ▲ | Aachen 6 days ago | parent | next [-] | | If your boss doesn't want you to browse the web, where some technical content is accompanied by an avatar that the author likes, they may not be suitable as boss, or at least not for positions where it's their job to look over your shoulder and make sure you're not watching series during work time. Seems like a weird employment place if they need to check that anyway | | |
| ▲ | jcelerier 6 days ago | parent [-] | | we have customers in our offices pretty much every day, I think "no anime girls on screens" is a fair request | | |
| ▲ | troyvit 6 days ago | parent | next [-] | | It's an MIT licensed, open project. Fork it and change the icon to your favorite white-bread corporate logo if you want. It would probably take less time than complaining about it on HN. | | |
| ▲ | Aachen 6 days ago | parent [-] | | I think the complaint is rather that you don't know when it will rear its face on third-party websites that you are visiting as part of work. Forking wouldn't help with not seeing it on other sites (Even if I agree that the boss or customers should just get over it. It's not like they're drawing genitalia on screen and it's also easily explainable if they don't already know it themselves.) | | |
| ▲ | efreak 5 days ago | parent [-] | | Add a rule to your adblocker for the image, then. The main site appears to have it at `anubis.techaro.lol/.within.website/x/cmd/anubis/static/img/happy.webp?cacheBuster=v1.21.3-43-gb0fa256`, so a rule for `||*/.within.website/x/cmd/anubis/static/img/$image` ought to work for ublock origin (purely a guess regarding wildcards for domain, I've never set a rule without a domain before) | | |
|
| |
| ▲ | sheepdestroyer 6 days ago | parent | prev [-] | | I fail to see how this particular "anime girl" and the potential for clients seeing it, could make you think that's a fair request. That seems extremely ridiculous to me. |
|
| |
| ▲ | ChocolateGod 6 days ago | parent | prev [-] | | If Anubis didn't ship with a weird looking anime girl I think people would treat it akin to Cloudflares block pages. | | |
| |
| ▲ | petralithic 6 days ago | parent | prev | next [-] | | We can make noise about both things, and how they're ruining the internet. | |
| ▲ | account42 6 days ago | parent | prev | next [-] | | Cloudflare's solution works without javascript enabled unless the website turns up the scare level to max or you are on an IP with already bad reputation. Anubis does not. But at the end of the day both are shit and we should not accept either. That includes not using one as an excuse for the other. | | |
| ▲ | superkuh 6 days ago | parent [-] | | Laughable. They say this but anyone who actually surfs the web with a non-bleeding edge non-corporate browser gets constantly blocked by Cloudflare. The idea that their JS computational paywalls only pop up rarely is absurd. Anyone believing this line lacks lived experience. My Comcast IP shouldn't have a bad rep and using a browser from ~2015 shouldn't make me scary. But I can't even read bills on congress.gov anymore thanks to bad CF deployals. Also, Anubis does have a non-JS mode: the HTML header meta-refresh based challenge. It's just that the type of people who use Cloudflare or Anubis almost always just deploy the default (mostly broken) configs that block as many human people as bots. And they never realize it because they only measure such things with javascript. |
| |
| ▲ | lupusreal 6 days ago | parent | prev | next [-] | | Over the past few years I've read far more comments complaining about Cloudflare doing it than Anubis. In fact, this discussion section is the first time I've seen people talking about Anubis. | |
| ▲ | ronsor 6 days ago | parent | prev [-] | | TO BE FAIR I dislike those even more. |
|
|
| ▲ | agwa 6 days ago | parent | prev | next [-] |
| It sounds like you're saying that it's not the proof-of-work that's stopping AI scrapers, but the fact that Anubis imposes an unusual flow to load the site. If that's true Anubis should just remove the proof-of-work part, so legitimate human visitors don't have to stare at a loading screen for several seconds while their device wastes electricity. |
| |
| ▲ | chrismorgan 6 days ago | parent | next [-] | | > If that's true Anubis should just remove the proof-of-work part This is my very strong belief. To make it even clearer how absurd the present situation is, every single one of the proof-of-work systems I’ve looked at has been using SHA-256, which is basically the worst choice possible. Proof-of-work is bad rate limiting which depends on a level playing field between real users and attackers. This is already a doomed endeavour. Using SHA-256 just makes it more obvious: there’s an asymmetry factor in the order of tens of thousands between common real-user hardware and software, and pretty easy attacker hardware and software. You cannot bridge such a divide. If you allow the attacker to augment it with a Bitcoin mining rig, the efficiency disparity factor can go up to tens of millions. These proof-of-work systems are only working because attackers haven’t tried yet. And as long as attackers aren’t trying, you can settle for something much simpler and more transparent. If they were serious about the proof-of-work being the defence, they’d at least have started with something like Argon2d. | | |
| ▲ | voidnap 6 days ago | parent | next [-] | | The proof of work isn't really the crux. They've been pretty clear about this from the beginning. I'll just quote from their blog post from January. https://xeiaso.net/blog/2025/anubis/ Anubis also relies on modern web browser features: - ES6 modules to load the client-side code and the proof-of-work challenge code. - Web Workers to run the proof-of-work challenge in a separate thread to avoid blocking the UI thread. - Fetch API to communicate with the Anubis server. - Web Cryptography API to generate the proof-of-work challenge. This ensures that browsers are decently modern in order to combat most known scrapers. It's not perfect, but it's a good start. This will also lock out users who have JavaScript disabled, prevent your server from being indexed in search engines, require users to have HTTP cookies enabled, and require users to spend time solving the proof-of-work challenge. This does mean that users using text-only browsers or older machines where they are unable to update their browser will be locked out of services protected by Anubis. This is a tradeoff that I am not happy about, but it is the world we live in now. | | |
| ▲ | account42 6 days ago | parent | next [-] | | Except this is exactly the problem. Now you are checking for mainstream browsers instead of some notion of legitimate users. And as TFA shows a motivated attacker can bypass all of that while legitimate users of non-mainstream browsers are blocked. | |
| ▲ | mewpmewp2 6 days ago | parent | prev | next [-] | | Aren't most scrapers using things like Playright or Puppeteer anyway by now, especially since so many pages are rendered using JS and even without Anubis would be unreadable without executing modern JS? | |
| ▲ | rfoo 6 days ago | parent | prev [-] | | ... except when you do not crawl with a browser at all. It's so trivial to solve just like the taviso post demostrated. This makes zero sense, this is simply the wrong approach. Already tired of saying so and been attacked. So I'm glad professional-random-Internet-bullshit-ignorer Tavis Ormandy wrote this one. |
| |
| ▲ | username332211 6 days ago | parent | prev [-] | | All this is true, but also somewhat irrelevant. In reality the amount of actual hash work is completely negligible. For usability reasons Anubus only requires that you to go trough a the proof of work flow only once in a given period. (I think the default is once per week.) That's just very little work. Detecting you need to occasionally send a request trough a headless browser far more of a hassle than the PoW. If you prefer LLMs rather than normal internet search, it'll probably consume far more compute as well. | | |
| ▲ | rendx 6 days ago | parent [-] | | > For usability reasons Anubus only requires that you to go trough a the proof of work flow only once in a given period. (I think the default is once per week.) That's just very little work. If you keep cookies. I do not want to keep cookies for otherwise "stateless" sites. I have maybe a dozen sites whitelisted, every other site loses cookies when I close the tab. | | |
| ▲ | account42 6 days ago | parent | next [-] | | A bigger problem is that you should not have to enable javascript for otherwise static sites. If you enable JS, cookies are a relatively minor issue compared to all the other ways the website can keep state about you. | |
| ▲ | username332211 6 days ago | parent | prev [-] | | Well, that's not a problem when scraping. Most scraping libraries have ways to retain cookies. |
|
|
| |
| ▲ | kaszanka 6 days ago | parent | prev | next [-] | | This is basically what most of the challenge types in go-away (https://git.gammaspectra.live/git/go-away/wiki/Challenges) do. | | |
| ▲ | Tmpod 6 days ago | parent [-] | | +1 for go-away. It's a bit more involved to configure, but worth the effort imo. It can be considerably more transparent to the user, triggering the nuclear PoW check less often, while being just as effective, in my experience. |
| |
| ▲ | amarant 6 days ago | parent | prev | next [-] | | I feel like the future will have this, plus ads displayed while the work is done, so websites can profit while they profit. | | |
| ▲ | silversmith 6 days ago | parent | next [-] | | Every now and then I consider stepping away from the computer job, and becoming a lumberjack. This is one of those moments. | | |
| ▲ | jones89176 6 days ago | parent | next [-] | | my family takes care of a large-ish forest, so I have to help since my early teens.
Let me tell you: think twice, it's f*ckin dangerous. Chainsaws, winches, heavy trees falling and breaking in unpredictable ways. I had a couple of close calls myself. Recently a guy from a neighbor village was squashed to death by a root plate that tilted. I often think about quitting tech myself, but becoming a full-time lumberjack is certainly not an alternative for me. | | |
| ▲ | silversmith 5 days ago | parent [-] | | Hah, I know, been around forests since childhood, seen (and done) plenty of sketchy stuff. For me it averages out to couple days of forest work a year. It's backbreaking labour, and then you deal with the weather. But man, if tech goes straight into cyberpunk dystopia but without the cool gadgets, maybe it is the better alternative. |
| |
| ▲ | zxexz 6 days ago | parent | prev [-] | | Worth getting to know the in and outs of forest management now. I don’t think AI will take most tech jobs soon, but they sure as hell are already making them boring. |
| |
| ▲ | JimDabell 6 days ago | parent | prev [-] | | adCAPTCHA already does this: https://adcaptcha.com | | |
| ▲ | Tmpod 6 days ago | parent | next [-] | | This is a joke, right? The landing page makes it seem so. I tried the captcha in their login page and it made the entire page, including the puzzle piece slider, run at 2 fps. My god, we do really live in 2025. | |
| ▲ | Aachen 6 days ago | parent | prev [-] | | Holy shit. Opening the demo from the menu, it's like captchas and youtube ads had a baby |
|
| |
| ▲ | tptacek 6 days ago | parent | prev | next [-] | | Exactly this. | |
| ▲ | empath75 6 days ago | parent | prev [-] | | I don't think anything will stop AI companies for long. They can do spot AI agentic checks of workflows that stop working for some reason and the AI can usually figure out what the problem is and then update the workflow to get around it. |
|
|
| ▲ | hedora 6 days ago | parent | prev | next [-] |
| This was obviously dumb when it launched: 1) scrapers just run a full browser and wait for the page to stabilize. They did this before this thing launched, so it probably never worked. 2) The AI reading the page needs something like 5 seconds * 1600W to process it. Assuming my phone can even perform that much compute as efficiently as a server class machine, it’d take a large multiple of five seconds to do it, and get stupid hot in the process. Note that (2) holds even if the AI is doing something smart like batch processing 10-ish articles at once. |
| |
| ▲ | pilif 6 days ago | parent | next [-] | | > This was obviously dumb when it launched: Yes. Obviously dumb but also nearly 100% successful at the current point in time. And likely going to stay successful as the non-protected internet still provides enough information to dumb crawlers that it’s not financially worth it to even vibe-code a workaround. Or in other words: Anubis may be dumb, but the average crawler that completely exhausting some sites resources is even dumber. And so it all works out. And so the question remains: how dumb was it exactly, when it works so well and continues to work so well? | | |
| ▲ | account42 6 days ago | parent | next [-] | | > Yes. Obviously dumb but also nearly 100% successful at the current point in time. Only if you don't care about negatively affecting real users. | | |
| ▲ | pilif 6 days ago | parent [-] | | I understand this as an argument that it’s better to be down for everyone than have a minority of users switch browsers. I’m not convinced by that makes sense. Now ideally you would have the resources to serve all users and all the AI bots without performance degradation, but for some projects that’s not feasible. In the end it’s all a compromise. |
| |
| ▲ | kldg 6 days ago | parent | prev | next [-] | | does it work well? I run chromium controlled by playwright for scraping and typically make Gemini implement the script for it because it's not worth my time otherwise. -but I'm not crawling the Internet generally (which I think there is very little financial incentive to do; it's a very expensive process even ignoring Anubis et al); it's always that I want something specific and am sufficiently annoyed by lack of API. regarding authentication mentioned elsewhere, passing cookies is no big deal. | | |
| ▲ | eaglefield 6 days ago | parent [-] | | Anubis is not meant to stop single endpoints from scraping. It's meant to make it harder for massive AI scrapers. The problematic ones evade rate limiting by using many different ip addresses, and make scraping cheaper on themselves by running headless. Anubis is specifically built to make that kind of scraping harder as i understand it. |
| |
| ▲ | bananalychee 6 days ago | parent | prev | next [-] | | Does it actually? I don't think I've seen a case study with hard numbers. | | | |
| ▲ | snickerdoodle12 6 days ago | parent | prev [-] | | the workaround is literally just running a headless browser, and that's pretty much the default nowadays. if you want to save some $$$ you can spend like 30 minutes making a cracker like in the article. just make it multi threaded, add a queue and boom, your scraper nodes can go back to their cheap configuration. or since these are AI orgs we're talking about, write a gpu cracker and laugh as it solves challenges far faster than any user could. custom solutions aren't worth it for individual sites, but with how widespread anubis is it's become worth it. |
| |
| ▲ | pama 6 days ago | parent | prev | next [-] | | I agree. Your estimate for (2), about 0.0022 kWh, corresponds to about a sixth of the charge of an iPhone 15 pro and would take longer than ten minutes on the phone, even at max power draw. It feels about right for the amount of energy/compute of a large modern MoE loading large pages of several 10k tokens. For example this tech (couple month old) could input 52.3k tokens per second to a 672B parameter model, per H100 node instance, which probably burns about 6–8kW while doing it. The new B200s should be about 2x to 3x more energy efficient, but your point still holds within an order of magnitude. https://lmsys.org/blog/2025-05-05-large-scale-ep/ | |
| ▲ | rob_c 6 days ago | parent | prev [-] | | The argument doesn't quite hold. The mass scraping (for training) is almost never doing by a GPU system it's almost always done by a dedicated system running a full chrome fork in some automated way (not just the signatures but some bugs give that away). And frankly processing a single page of text is run within a single token window so likely is run for a blink (ms) before moving onto the next data entry. The kicker is it's run over potentially thousands of times depending on your training strategy. At inference there's now a dedicated tool that may perform a "live" request to scrape the site contents. But then this is just pushed into a massive context window to give the next token anyway. | | |
| ▲ | account42 6 days ago | parent [-] | | The point is that scraping is already inherently cost-intensive so a small additional cost from having to solve a challenge is not going to make a dent in the equation. It doesn't matter what server is doing what for that. | | |
| ▲ | mistercheph 6 days ago | parent [-] | | 100 billion web pages * 0.02 USD of PoW/page = 2 billion dollars, the point is not to stop every scraper/crawler, the point is to raise the costs enough to avoid being bombarded by all of them | | |
| ▲ | jsnell 6 days ago | parent | next [-] | | Yes, but it's not going to be 0.02 USD of PoW per page! That is an absurd number. It'd mean a two-hour proof of work for a server CPU, a ten hour proof of work for a phone. In reality you can do maybe a 1/10000th of that before the latency hit to real users becomes unacceptable. And then, the cost is not per page. The cost is per cookie. Even if the cookie is rate-limited, you could easily use it for 1000 downloads. Those two errors are multiplicative, so your numbers are probably off by about 7 orders of magnitudes. The cost of the PoW is not going to be $2B, but about $200. | |
| ▲ | skeptrune 6 days ago | parent | prev [-] | | I'm going to phrase the explanation like this in the future. Couldn't have said it better myself. |
|
|
|
|
|
| ▲ | psionides 6 days ago | parent | prev | next [-] |
| The problem is that 7 + 2 on a submission form only affects people who want to submit something, Anubis affects every user who wants to read something on your site |
| |
| ▲ | account42 6 days ago | parent [-] | | The question then is why read only users are consuming so much resources that serving them big chunks of JS instead reduces loads of the server. Maybe improve you rendering and/or caching before employing DRM solutions that are doomed to fail anyway. | | |
| ▲ | Mateon1 6 days ago | parent [-] | | The problem it's originally fixing is bad scrapers accessing dynamic site content that's expensive to produce, like trying to crawl all diffs in a git repo, or all mediawiki oldids.
Now it's also used on mostly static content because it is effective vs scrapers that otherwise ignore robots.txt. |
|
|
|
| ▲ | monooso 6 days ago | parent | prev | next [-] |
| The author make it very clear that he understands the problem Anubis is attempting to solve. His issue is that the chosen approach doesn't solve that problem; it just inhibits access to humans, particularly those with limited access to compute resources. That's the opposite of being dismissive. The author has taken the time to deeply understand both the problem and the proposed solution, and has taken the time to construct a well-researched and well-considered argument. |
|
| ▲ | Aurornis 6 days ago | parent | prev | next [-] |
| > This is a usually technical crowd, so I can't help but wonder if many people genuinely don't get it, or if they are just feigning a lack of understanding to be dismissive of Anubis. This is a confusing comment because it appears you don’t understand the well-written critique in the linked blog post. > This is like those simple things on submission forms that ask you what 7 + 2 is. Of course everyone knows that a crawler can calculate that! But it takes a human some time and work to tell the crawler HOW. The key point in the blog post is that it’s the inverse of a CAPTCHA: The proof of work requirement is solved by the computer automatically. You don’t have to teach a computer how to solve this proof of work because it’s designed for the computer to solve the proof of work. It makes the crawling process more expensive because it has to actually run scripts on the page (or hardcode a workaround for specific versions) but from a computational perspective that’s actually easier and far more deterministic than trying to have AI solve visual CAPTCHA challenges. |
| |
| ▲ | necovek 6 days ago | parent [-] | | But for actual live users who don't see anything but a transient screen, Anubis is a better experience than all those pesky CAPTCHAs (I am bored of trying to recognize bikes, pedestrian crossings, buses, hydrants). The question is if this is the sweet spot, and I can't find anyone doing the comparative study (how many annoyed human visitors, how many humans stopped and, obviously, how many bots stopped). | | |
| ▲ | JimDabell 6 days ago | parent [-] | | > Anubis is a better experience than all those pesky CAPTCHAs (I am bored of trying to recognize bikes, pedestrian crossings, buses, hydrants). Most CAPTCHAs are invisible these days, and Anubis is worse than them. Also, CAPTCHAs are not normally deployed just for visiting a site, they are mostly used when you want to submit something. | | |
| ▲ | necovek 6 days ago | parent [-] | | We are obviously living a different Internet reality, and that's the whole point — we need numbers to really establish baseline truth. FTR, I am mostly browsing from Serbia using Firefox browser on a Linux or MacOS machine. | | |
| ▲ | JimDabell 6 days ago | parent [-] | | I don’t think we are living in a different reality, I just don’t think you are accounting for all the CAPTCHAs you successfully pass without seeing. | | |
| ▲ | necovek 6 days ago | parent | next [-] | | Wouldn't it be nice to have a good study that supports either your or my view? FWIW, I've never been stopped by Anubis, so even if it's much more rarely implemented, that's still infinitely less than 5-10 captchas a day I do see regularly. I do agree it's still different scales, but I don't trust your gut feel either. Thus a suggestion to look for a study. | |
| ▲ | martin_a 6 days ago | parent | prev [-] | | Not OP but try browsing the web with a combination of Browser + OS that is slightly off to what most people use and you'll see Captchas pop up at every corner of the Internet. And if the new style of Captchas is then like this one it's much more disturbing. |
|
|
|
|
|
|
| ▲ | cakealert 6 days ago | parent | prev | next [-] |
| This arms race will have a terminus. The bots will eventually be indistinguishable from humans. Some already are. |
| |
| ▲ | overfeed 6 days ago | parent | next [-] | | > The bots will eventually be indistinguishable from humans Not until they get issued government IDs they won't! Extrapolating from current trends, some form of online ID attestation (likely based on government-issued ID[1]) will become normal in the next decade, and naturally, this will be included in the anti-bot arsenal. It will be up to the site operator to trust identities signed by the Russian government. 1. Despite what Sam Altman's eyeball company will try to sell you, government registers will always be the anchor of trust for proof-of-identity, they've been doing it for centuries and have become good at it and have earned the goodwill. | | |
| ▲ | marcus_holmes 6 days ago | parent | next [-] | | How does this work, though? We can't just have "send me a picture of your ID" because that is pointlessly easy to spoof - just copy someone else's ID. So there must be some verification that you, the person at the keyboard, is the same person as that ID identifies. The UK is rapidly finding out that that is extremely difficult to do reliably. Video doesn't really work reliably on all cases, and still images are too easily spoofed. It's not really surprising, though, because identifying humans reliably is hard even for humans. If we do it at the network level - like assigning a government-issued network connection to a specific individual, so the system knows that any traffic from a given IP address belongs to that specific individual. There are obvious problems with this model, not least that IP addresses were never designed for this, and spoofing an IP becomes identity theft. We also do need bot access for things, so there must be some method of granting access to bots. I think that to make this work, we'd need to re-architect the internet from the ground up. To get there, I don't think we can start from here. | | |
| ▲ | tern 6 days ago | parent | next [-] | | If you're really curious about this, there's a place where people discuss these problems annually: https://internetidentityworkshop.com/ Various things you're not thinking of: - "The person at the keyboard, is the same person as that ID identifies" is a high expectation, and can probably be avoided—you just need verifiable credentials and you gotta trust they're not spoofed - Many official government IDs are digital now - Most architectures for solving this problem involve bundling multiple identity "attestations," so proof of personhood would ultimately be a gradient. (This does, admittedly, seem complicated though ... but World is already doing it, and there are many examples of services where providing additional information confers additional trust. Blue checkmarks to name the most obvious one.) As for what it might look like to start from the ground up and solve this problem, https://urbit.org/, for all its flaws, is the only serious attempt I know of and proves it's possible in principle, though perhaps not in practice | | |
| ▲ | marcus_holmes 5 days ago | parent [-] | | that is interesting, thanks. Why isn't it necessary to prove that the person at the keyboard is the person in the ID? That seems like the minimum bar for entry to this problem. Otherwise we can automate the ID checks and the bots can identify as humans no problem. And how come the UK is failing so badly at this? |
| |
| ▲ | TheDong 6 days ago | parent | prev | next [-] | | We almost all have IC Chip readers in our pocket (our cell phones), so if the government issues a card that has a private key embedded in it, akin to existing GnuPG SmartCards, you can use your phone to sign an attestation of your personhood. In fact, Japan already has this in the form of "My Number Card". You go to a webpage, the webpage says "scan this QR code, touch your phone to your ID card, and type in your pin code", and doing that is enough to prove the the website that you're a human. You can choose to share name/birthday/address, and it's possible to only share a subset. Robots do not get issued these cards. The government verifies your human-ness when they issue them.
Any site can use this system, not just government sites. | | |
| ▲ | 47282847 6 days ago | parent [-] | | Germany has this. The card plus PIN technically proves you are in current possession of both, not that you are the person (no biometrics or the like). You can chose to share/request not only certain data fields but also eg if you are below or above a certain age or height without disclosing the actual number. | | |
| ▲ | bregma 6 days ago | parent [-] | | > if you are below or above a certain age or height Is discrimination against dwarves still a thing in Germany? | | |
| ▲ | TheDong 6 days ago | parent [-] | | I want to believe that this would be used at amusement parks to scan "can I safely get on this ride" and at the entrance to stairs to tell you if you'll bump your head or not. | | |
| ▲ | 47282847 6 days ago | parent [-] | | The system as a whole is rarely used. I think it’s a combination of poor APIs and hesitation of the population. For somebody without technical knowledge, there is no obvious difference to the private video ID companies. On the surface, you may believe that all data is transferred anyway and you have to trust providers in all cases, not that some magic makes it so third parties don’t get more than necessary. I don’t know of any real world example that queries height, I mentioned it because it is part of the data set and privacy-preserving queries are technically possible. Age restrictions are the obvious example, but even there I am not aware of any commercial use, only for government services like tax filing or organ donor registry. Also, nobody really measures your height, you just tell them what to put there when you get the ID. Not so for birth dates, which they take from previous records going back to the birth certificate. |
|
|
|
| |
| ▲ | IncRnd 6 days ago | parent | prev | next [-] | | That is already solved by governments and businesses. If you have recently attempted to log into a US government website, you were probably told that you need Login.gov or ID.me. ID.me verifies identity via driver’s license, passport, Social Security number—and often requires users to take a video selfie, matched against uploaded ID images. If automated checks fail, a “Trusted Referee” video call is offered. If you think this sounds suspiciously close the what businesses do with KYC, Know Your Customer, you're correct! | |
| ▲ | gambiting 6 days ago | parent | prev | next [-] | | UK is stupidly far behind on this though. On one hand the digitization of government services is really well done(thanks to the fantastic team behind .gov websites), but on the other it's like being in the dark ages of tech. My native country has physical ID cards that contain my personal certificate that I can use to sign things or to - gasp! - prove that I am who I say I am. There is a government app that you can use to scan your ID card using the NFC chip in your phone, after providing it with a password that you set when you got the card it produces a token that can then be used to verify your identy or sign documents digitally - and those signatures legally have the same weight as real paper signatures. UK is in this weird place where there isn't one kind of ID that everyone has - for most people it's the driving licence, but obviously that's not good enough. But my general point is that UK could just look over at how other countries are doing it and copy good solutions to this problem, instead of whatever nonsense is being done right now with the age verification process being entirely outsourced to private companies. | | |
| ▲ | exasperaited 6 days ago | parent | next [-] | | > UK is in this weird place where there isn't one kind of ID that everyone has - for most people it's the driving licence, but obviously that's not good enough. As a Brit I personally went through a phase of not really existing — no credit card, no driving licence, expired passport - so I know how annoying this can be. But it’s worth noting that we have this situation not because of mismanagement or technical illiteracy or incompetence but because of a pretty ingrained (centuries old) political and cultural belief that the police shouldn’t be able to ask you “papers please”. We had ID cards in World War II, everyone found them egregious and they were scrapped. It really will be discussed in those terms each time it is mentioned, and it really does come down to this original aspect of policing by consent. So the age verification thing is running up against this lack of a pervasive ID, various KYC situations also do, we can get an ID card to satisfy verification for in-person voting if we have no others, but it is not proof of identity anywhere else, etc. It is frustrating to people who do not have that same cultural touchstone but the “no to ID” attitude is very very normal; generally the UK prefers this idea of contextual, rather than universal ID. It’s a deliberate design choice. | | |
| ▲ | marcus_holmes 5 days ago | parent [-] | | Same in Australia - there was a referendum about whether we should have government-issued ID cards, and the answer was an emphatic "NO". And Australia is hitting or going to hit the same problem with the age verification thing for social media. |
| |
| ▲ | jhbadger 3 days ago | parent | prev [-] | | >UK is in this weird place where there isn't one kind of ID that everyone has - for most people it's the driving licence, but obviously that's not good enough The US also lacks a national ID, but as a non-driver myself, this is handled by things called variously by state a "state ID" or a "non-driver's driving license". These look exactly like driver's licenses and can be used wherever those can for ID (like for flying) except for a line saying "not valid for driving". |
| |
| ▲ | xlbuttplug2 6 days ago | parent | prev | next [-] | | IDs would have to be reissued with a public/private key model you can use to sign your requests. > the person at the keyboard, is the same person as that ID identifies This won't be possible to verify - you could lend your ID out to bots but that would come at the risk of being detected and blanket banned from the internet. | | |
| ▲ | wredcoll 6 days ago | parent [-] | | I have a wonderful new idea for this problem space based on your username. |
| |
| ▲ | heavyset_go 6 days ago | parent | prev | next [-] | | Not good enough, providers and governments want proof of life and proof of identity that matches government IDs. Without that, anyone can pretend to be their dead grandma/murder victim, or someone whose ID they stole. | | |
| ▲ | sciencejerk 6 days ago | parent [-] | | How about a chip implant signed by the government hospital that attests for your vitality? Looks like this is where things are headed |
| |
| ▲ | phito 6 days ago | parent | prev | next [-] | | In Europe we have itsme. You link the phone app to your ID, then you can use it to scan QR codes to log into websites. | | |
| ▲ | swores 6 days ago | parent | next [-] | | "In Europe" is technically true but makes it sound more widely used than I believe it to be... though maybe my knowledge is out of date. Their website lists 24 supported countries (including some non-EU like UK and Norway, and missing a few of the 27 EU countries) - https://www.itsme-id.com/en-GB/coverage But does it actually have much use outside of Belgium? Certainly in the UK I've never come across anyone, government or private business, mentioning it - even since the law passed requiring many sites to verify that visitors are adults. I wouldn't even be familiar with the name if I hadn't learned about its being used in Belgium. Maybe some other countries are now using it, beyond just Belgium? | | |
| ▲ | phito 5 days ago | parent [-] | | Oh I wasn't aware of that. I remember a Dutch friend talking to me about a similar app they had. Maybe they have a re-branded version of it? |
| |
| ▲ | victorbjorklund 6 days ago | parent | prev | next [-] | | One problem with solutions like that is the the website needs to pay for every log in. So you save a few dollars blocking scrapers but now you have to pay thousands of dollars to this company instead. | |
| ▲ | victorbjorklund 6 days ago | parent | prev | next [-] | | Im from europe I never heard about it | |
| ▲ | JimDabell 6 days ago | parent | prev [-] | | In Singapore, we have SingPass, which is also an OpenID Connect implementation. |
| |
| ▲ | PeterStuer 6 days ago | parent | prev | next [-] | | Officially sanctioned 2fa tied to your official government ID. Over here we have "It's me" [1]. Yes, you can in theory still use your ID card with a usb cardreader for accessing gov services, but good luck finding up to date drivers for your OS or use a mobile etc. [1] https://www.itsme-id.com/en-BE/ | | |
| ▲ | sintax 6 days ago | parent [-] | | Except that itsme crap is not from the government and doesn't support activation on anything but a Windows / Mac machine. No Linux support at all, while the Belgian government stuff (CSAM) supports Linux just fine. | | |
| ▲ | PeterStuer 5 days ago | parent [-] | | It is from the banks that leveraged their KYC but was adopted very broadly by gov and many other id required or linked services. AFAIK it does not need a computer to activate besides your phone and one of those bank issued 2FA challange card readers. For CSAM, also AFAIK, first 'activation' includes a visit to your local municipality to verify your identity. Unless you go via itsme, as it is and authorized CSAM key holder. |
|
| |
| ▲ | throwaway1777 6 days ago | parent | prev [-] | | I doesn’t require a ground up rework. The easiest idea is real people can get an official online id at some site like login.gov and website operators verify people using that api. Some countries already have this kind of thing from what I understand. The tech bros want to implement this on the blockchain but the government could also do it. |
| |
| ▲ | bhawks 6 days ago | parent | prev | next [-] | | Can't wait to sign into my web browser with my driver's license. | | |
| ▲ | weberer 6 days ago | parent | next [-] | | What's next? Requiring a license to make toast in your own damn toaster? | | |
| ▲ | jijijijij 5 days ago | parent [-] | | > your own damn toaster Silly you, joking around like that. Can you imagine owning a toaster?! Sooo inconvenient and unproductive! Guess, if you change your housing plan, you gonna bring it along like an infectious tick? Hahah — no thank you! :D You will own nothing and you will be happy! (Please be reminded, failing behavioral compliance with, and/or voicing disapproval of this important moral precept, jokingly or not, is in violation of your citizenship subscription's general terms and conditions. This incident will be reported. Customer services will assist you within 48 hours. Please, do not leave your base zone until this issue has been resolved to your satisfaction.) |
| |
| ▲ | overfeed 6 days ago | parent | prev | next [-] | | In all likelihood, most people will do so via the Apple Wallet (or the equivalent on their non-Apple devices). It's going to be painful to use Open source OSes for a while, thanks to CloudFlare and Anubis. This is not the future I want, but we can't have nice things. | | |
| ▲ | hedora 6 days ago | parent | next [-] | | No worries. Stick an unregistered copy of win 11 (ms doesn’t seem to care) and your drivers license in an isolated VM and let the AI RDP
into it for you. Manually browsing the web yourself will probably be trickier moving forward though. | |
| ▲ | account42 6 days ago | parent | prev [-] | | > This is not the future I want, but we can't have nice things. Actually, we can if we collectively decide that we should have them. Refuse to use sites that require these technologies and demand governments to solve the issue in better ways, e.g. by ensuring there are legal consequences for abusive corporations. |
| |
| ▲ | heavyset_go 6 days ago | parent | prev [-] | | "Luckily" you won't have to do only that, you'll need to provide live video to prove you're the person in the ID and that you're alive. |
| |
| ▲ | xlbuttplug2 6 days ago | parent | prev | next [-] | | The internet would come to a grinding halt as everyone would suddenly become mindful of their browsing. It's not hard to imagine a situation where, say, pornhub sells its access data and the next day you get sacked at your teaching job. | | |
| ▲ | chmod775 6 days ago | parent | next [-] | | It doesn't need to. Thanks to asymmetric cryptography governments can in theory provide you with a way to prove you are a human (or of a certain age) without: 1. the government knowing who you are authenticating yourself to 2. or the recipient learning anything but the fact that you are a human 3. or the recipient being able to link you to a previous session if you authenticate yourself again later The EU is trying to build such a scheme for online age verification (I'm not sure if their scheme also extends to point 3 though. Probably?). | | |
| ▲ | palata 6 days ago | parent | next [-] | | But I don't get how is goes for spam or scrapping: if I can pass the test "anonymously", then what prevents me from doing it for illegal purposes? I get it for age verification: it is difficult for a child to get a token that says they are allowed to access porn because adults around them don't want them to access porn (and even though one could sell tokens online, it effectively makes it harder to access porn as a child). But how does it prevent someone from using their ID to get tokens for their scrapper? If it's anonymous, then there is no risk in doing it, is there? | | |
| ▲ | 986aignan 6 days ago | parent | next [-] | | IIRC, you could use asymmetric cryptography to derive a site-specific pseudonymous token from the service and your government ID without the service knowing what your government ID is or the government provider knowing what service you are using. The service then links the token to your account and uses ordinary detection measures to see if you're spamming, flooding, phishing, whatever. If you do, the token gets blacklisted and you can no longer sign on to that service. This isn't foolproof - you could still bribe random people on the street to be men/mules in the middle and do your flooding through them - but it's much harder than just spinning up ten thousand bots on a residential proxy. | | |
| ▲ | palata 6 days ago | parent [-] | | But that does not really answer my question: if a human can prove that they are human anonymously (by getting an anonymous token), what prevents them from passing that token to an AI? The whole point is to prevent a robot from accessing the API. If you want to detect the robot based on its activity, you don't need to bother humans with the token in the first place: just monitor the activity. | | |
| ▲ | xlbuttplug2 6 days ago | parent [-] | | It does not prevent a bot from using your ID. But a) the repercussions for getting caught are much more tangible when you can't hide behind anonymity - you risk getting blanket banned from the internet and b) the scale is significantly reduced - how many people are willing to rent/sell their IDs, i.e., their right to access the internet? Edit: ok I see the argument that the feedback mechanism could be difficult when all the website can report is "hey, you don't know me but this dude from request xyz you just authenticated fucked all my shit up". But at the end of the day, privacy preservation is an implementation detail I don't see governments guaranteeing. | | |
| ▲ | palata 5 days ago | parent [-] | | > But at the end of the day, privacy preservation is an implementation detail I don't see governments guaranteeing. Sure, I totally see how you can prevent unwanted activity by identifying the users. My question was about the privacy-preserving way. I just don't see how that would be possible. |
|
|
| |
| ▲ | terribleperson 6 days ago | parent | prev [-] | | One option I can think of is that the attesting authority might block you if you're behaving badly. | | |
| ▲ | account42 6 days ago | parent [-] | | That doesn't work without the attesting authority knowing what you are doing, which would make this scheme no longer anonymous. | | |
| ▲ | A1kmm 6 days ago | parent [-] | | It does work as long as the attesting authority doesn't allow issuing a new identity (before it expires) if the old one is lost. You (Y) generate a keypair and send your public key to the the attesting authority A, and keep your private key. You get a certificate. You visit site b.com, and it asks for your identity, so you hash b.com|yourprivatekey. You submit the hash to b.com, along with a ZKP that you possess a private key that makes the hash work out, and that the private key corresponds to the public key in the certificate, and that the certificate has a valid signature from A. If you break the rules of b.com, b.com bans your hash. Also, they set a hard rate limit on how many requests per hash are allowed. You could technically sell your hash and proof, but a scraper would need to buy up lots of them to do scraping. Now the downside is that if you go to A and say your private key was compromised, or you lost control of it - the answer has to be tough luck. In reality, the certificates would expire after a while, so you could get a new hash every 6 months or something (and circumvent the bans), and if you lost the key, you'd need to wait out the expiry. The alternative is a scheme where you and A share a secret key - but then they can calculate your hash and conspire with b.com to unmask you. | | |
| ▲ | palata 6 days ago | parent [-] | | Isn't the whole point of a privacy-preserving scheme be that you can ask many "certificates" to the attesting authority and it won't care (because you may need as many as the number of websites you visit), and the website b.com won't be able to link you to them, and therefore if it bans certificate C1, you can just start using certificate C2? And then of course, if you need millions of certificates because b.com keeps banning you, it means that they ban you based on your activity, not based on your lack of certificate. And in that case, it feels like the certificate is useless in the first place: b.com has to monitor and ban you already. Or am I missing something? |
|
|
|
| |
| ▲ | heavyset_go 6 days ago | parent | prev | next [-] | | There isn't a technical solution to this: governments and providers not only want proof of identity matching IDs, they want proof of life, too. This will always end with live video of the person requesting to log in to provide proof of life at the very least, and if they're lazy/want more data, they'll tie in their ID verification process to their video pipeline. | | |
| ▲ | debugnik 6 days ago | parent [-] | | You already provided proof of a living legal identity when you got the ID, and it already expires to make you provide proof again every few years. | | |
| ▲ | heavyset_go 6 days ago | parent [-] | | That's not not the kind of proof of life the government and companies want online. They want to make sure their video identification 1) is of a living person right now, and 2) that living person matches their government ID. It's a solution to the "grandma died but we've been collecting her Social Security benefits anyway", or "my son stole my wallet with my ID & credit card", or (god forbid) "We incapacitated/killed this person to access their bank account using facial ID". It's also a solution to the problem advertisers, investors and platforms face of 1) wanting huge piles of video training data for free and 2) determining that a user truly is a monetizable human being and not a freeloader bot using stolen/sold credentials. | | |
| ▲ | palata 6 days ago | parent | next [-] | | > That's not not the kind of proof of life the government and companies want online. Well that's your assumption about governments, but it doesn't have to be true. There are governments that don't try to exploit their people. The question is whether such governments can have technical solutions to achieve that or not (I'm genuinely interested in understanding whether or not it's technically feasible). | |
| ▲ | debugnik 6 days ago | parent | prev [-] | | It's the kind of proof my government already asks of me to sign documents much, much more important than watching adult content, such as social security benefits. |
|
|
| |
| ▲ | cakealert 6 days ago | parent | prev | next [-] | | Such schemes have the fatal flaw that they can be trivially abused. All you need are a couple of stolen/sold identities and bots start proving their humanness and adultness to everyone. | | |
| ▲ | overfeed 6 days ago | parent | next [-] | | > Such schemes have the fatal flaw that they can be trivially abused I wouldn't expect the abuse rate to be higher than what it is for chip-and-pin debit cards. PKI failure modes are well understood and there are mitigations galore. | |
| ▲ | Almondsetat 6 days ago | parent | prev [-] | | Blatant automatic behavior can still be detected, and much more definitive actions can be takes in such a system | | |
| ▲ | palata 6 days ago | parent [-] | | Detecting is a thing, but how do you identify the origin if it was done in a privacy-preserving manner? The whole point was that you couldn't, right? |
|
| |
| ▲ | xlbuttplug2 6 days ago | parent | prev | next [-] | | I did think asymmetric cryptography but I assumed the validators would be third parties / individual websites and therefore connections could be made using your public key. But I guess having the government itself provide the authentication service makes more sense. I wonder if they'd actually honor 1 instead of forcing recipients to be registered, as presumably they'd be interested in tracking user activity. | |
| ▲ | ummonk 6 days ago | parent | prev [-] | | How would it prevent you from renting your identity out to a bot farm? | | |
| ▲ | overfeed 6 days ago | parent [-] | | Besides making yourself party to a criminal conspiracy, I suspect it would be partly the same reason you won't sell/rent your real-world identity to other people today; an illegal immigrant may be willing to rent it from you right now. Mostly, it will because online identifies will be a market for lemons: there will be so many fake/expired/revoked identities being sold that the value of each one will be worth pennies, and that's not commensurate with the risk of someone commiting crimes and linking it to your government-registered identity. | | |
| ▲ | palata 6 days ago | parent | next [-] | | > the same reason you won't sell/rent your real-world identity to other people today If you sell your real-world identity to other people today, and they get arrested, then the police will know your identity (obviously). How does that work with a privacy-preserving scheme? If you sell your anonymous token that says that you are a human to a machine and the machine gets arrested, then the police won't be able to know who you are, right? That was the whole point of the privacy-preserving token. I'm genuinely interested, I don't understand how it can work technically and be privacy-preserving. | | |
| ▲ | cakealert 6 days ago | parent [-] | | It would appear most of the people commenting on the subject don't even understand it. With privacy preserving cryptography the tokens are standalone and have no ties to the identity that spawned them. No enforcement for abuse is possible. | | |
| ▲ | overfeed 5 days ago | parent | next [-] | | > With privacy preserving cryptography the tokens are standalone and have no ties to the identity that spawned them. I suspect there will be different levels of attestations from the anonymous ("this is an adult"), to semi-anonymous ("this person was born in 20YY and resides in administrative region XYZ") to the compete record ("This is John Quincy Smith III born on YYYY-MM-DD with ID doc number ABC123"). Somewhere in between the extremes is an pseudonymous token that's strongly tied to a single identity with non-repudiation. Anonymous identities that can be easily churned out on demand by end-users have zero antibot utility | | |
| ▲ | cakealert 5 days ago | parent [-] | | The latter attestation will be completely useless for privacy. | | |
| ▲ | overfeed 3 days ago | parent [-] | | 100% agree, but it will be necessary for any non-repudiation use cases, like signing contracts remotely. There is no one size fits all approach for online identity management. |
|
| |
| ▲ | palata 6 days ago | parent | prev [-] | | Right, that's my feeling as well | | |
| ▲ | overfeed 5 days ago | parent [-] | | While it's the privacy advocate's ideal, the politics reality is very few governments will deploy "privacy preserving" cryptography that gets in the way of LE investigations[1]. The best you can hope for is some escrowed service that requires a warrant to unmask the identity for any given token, so privacy is preserved in most cases, and against most parties except law enforcement when there's a valid warrant. 1. They can do it overtly in thr design of the system, or covertly via side-channels, logging, or leaking bits in ways that are hard for an outsider to investigate without access to the complete source code and or/system outputs, such as not-quite-random pseudo-randoms. |
|
|
| |
| ▲ | coolcoder613 6 days ago | parent | prev [-] | | > Mostly, it will because online identifies will be a market for lemons: there will be so many fake/expired/revoked identities being sold that the value of each one will be worth pennies, and that's not commensurate with the risk of someone commiting crimes and linking it to your government-registered identity.
That would be trivially solved by using same verification mechanisms they would be used with. |
|
|
| |
| ▲ | wredcoll 6 days ago | parent | prev | next [-] | | I live with the naïve and optimistic dream that something like that would just show that everyone was in the list so they can't use it to discriminate against people. | |
| ▲ | account42 6 days ago | parent | prev | next [-] | | You are right about the negative outcomes that this might have but you have way too much faith in the average person caring enough before it happens to them. | |
| ▲ | glandium 6 days ago | parent | prev [-] | | > sells its access data or has it leaked somehow. |
| |
| ▲ | tern 6 days ago | parent | prev | next [-] | | Eyeball company play is to be a general identity provider, which is an obvious move for anyone who tries to fill this gap. You can already connect your passport in the World app. https://world.org/blog/announcements/new-world-id-passport-c... | | |
| ▲ | esnard 6 days ago | parent [-] | | Note: one of the founders of the World app is Sam Altman. |
| |
| ▲ | JimDabell 6 days ago | parent | prev | next [-] | | > some form of online ID attestation (likely based on government-issued ID[1]) will become normal in the next decade I believe this is likely, and implemented in the right way, I think it will be a good thing. A zero-knowledge way of attesting persistent pseudonymous identity would solve a lot of problems. If the government doesn’t know who you are attesting to, the service doesn’t know your real identity, services can’t correlate users, and a service always sees the same identity, then this is about as privacy-preserving as you can get with huge upside. A social media site can ban an abusive user without them being able to simply register a new account. One person cannot operate tens of thousands of bot profiles. Crawlers can be banned once. Spammers can be locked out of email. | | |
| ▲ | akk0 6 days ago | parent | next [-] | | > A social media site can ban an abusive user without them being able to simply register a new account. This is an absolutely gargantuan-sized antifeature that would single-handedly drive me out of the parts of the internet that choose to embrace this hellish tech. | | |
| ▲ | JimDabell 6 days ago | parent [-] | | I think social media platforms should have the ability to effectively ban abusive users, and I’m pretty sure that’s a mainstream viewpoint shared by most people. The alternative is that you think people should be able to use social media platforms in ways that violate their rules, and that the platforms should not be able to refuse service to these users. I don’t think that’s a justifiable position to take, but I’m open to hearing an argument for it. Simply calling it “hellish” isn’t an argument. And can you clarify if your position accounts for spammers? Because as far as I can see, your position is very clearly “spammers should be allowed to spam”. | | |
| ▲ | akk0 4 days ago | parent [-] | | No, my position is not any of these things you just decided to attribute to me. Allowing people to make alternate accounts has been the status quo on the internet since time immemorial, if only because it's currently not preventable. False bans are not rare (I only got unbanned from LinkedIn after getting banned with no explanation and having my appeal initially denied, for instance). I've gotten banned on places, rightfully (in my view) or not, then come back on a new account and avoided stepping on anyone's toes and lived happily ever after, too. Of course in the ideal world all bans would be handed out correctly, be of a justified duration, and offer due process to those banned. We don't live in that world, the incentive is emphatically NOT to handle appeals fairly and understandably. Getting truly permanently banned on a major platform can be a life changing experience. In reality users can generally get away with signing up new accounts, but new users will be marked somehow and/or limited (e.g. green names on HN) and get extra scrutiny, and sign-ups will have friction and limits to let it not scale up to mass spammer scale. The rest is handled manually by moderation staff. The limits to moderator power are a feature that compensates for the limits to moderator competence. |
|
| |
| ▲ | ibejoeb 6 days ago | parent | prev [-] | | >A zero-knowledge way of attesting persistent pseudonymous identity why would a government do that though? the alternative is easier and gives it more of what it wants. | | |
| |
| ▲ | exasperaited 6 days ago | parent | prev | next [-] | | At this future point, AI firms will simply rent people’s identities to use online. | | |
| ▲ | account42 6 days ago | parent [-] | | They are already getting people hooked on "free" access so they will have plenty of subjects willing to do that to keep that access. | | |
| ▲ | exasperaited 6 days ago | parent [-] | | And if they are as successful as they are threatening to be, they will have destroyed so many jobs that I am sure they will find a few thousand people across the world who will accept a stipend to loan their essence to the machine. |
|
| |
| ▲ | john01dav 6 days ago | parent | prev | next [-] | | This has quite nasty consequences for privacy. For this reason, alternatives are desirable. I have less confidence on what such an alternative should be, however. | | |
| ▲ | palata 6 days ago | parent [-] | | Can you elaborate on that? Are you implying that it is strictly impossible to do this in a privacy-preserving way? | | |
| ▲ | michaelt 6 days ago | parent | next [-] | | It depends on your precise requirements and assumptions. Does your definition of 'privacy-preserving' distrust Google, Apple, Xiaomi, HTC, Honor, Samsung and suchlike? Do you also distrust third-party clowns like experian and equifax (whose current systems have gaping security holes) and distrust large government IT projects (which are outsourced to clowns like Fujutsu who don't know what they're doing) ?? Do you require it to work on all devices, including outdated phones and tablets; PCs; Linux-only devices; other networked devices like smart lightbulbs; and so on? Does it have to work in places phones aren't allowed, or mobile data/bluetooth isn't available? Does the identity card have to be as thin, flexible, durable and cheap as a credit card, precluding any built-in fingerprint sensors and suchlike? Does the age validation have to protect against an 18-year-old passing the age check on their 16-year-old friend's account? While also being privacy-preserving enough nobody can tell the two accounts were approved with the same ID card? Does the system also have to work on websites without user accounts, because who the hell creates a pornhub account anyway? Does the system need to work without the government approving individual websites' access to the system? Does it also need to be support proving things like name, nationality, and right to work in the country so people can apply for bank accounts and jobs online? And yet does it need to prevent sites from requiring names just for ad targeting purposes? Do all approvals have to be provable, so every company can prove to the government that the checks were properly carried out at the right time? Does it have to be possible to revoke cards in a timely manner, but without maintaining a huge list of revoked cards, and without every visit to a porn site triggering a call to a government server for a revocation check? If you want to accomplish all of these goals - you're going to have a tough time. | | |
| ▲ | palata 5 days ago | parent [-] | | Not sure what you are trying to say. I can easily imagine having a way to prove my age in a privacy-preserving way: a trusted party knows that I am 18+ and gives me a token that proves that I am 18+ without divulging anything else. I take that token and pass it to the website that requires me to be 18+. The website knows nothing about me other than I have a token that says I am 18+. Of course, I can get a token and then give it to a child. Just like I can buy cigarettes and give them to a child. But the age verification helps in that I don't want children to access cigarettes, so I won't do it. The "you are a human" verification fundamentally doesn't work, because the humans who make the bots are not aligned with the objective of the verification. If it's privacy-preserving, it means that a human can get a token, feed it to their bot and call it a day. And nobody will know who gave the token to the bot, precisely because it is privacy-preserving. |
| |
| ▲ | john01dav 6 days ago | parent | prev | next [-] | | I am not implying anything and mean only what I directly said. More specifically, I do not know if a privacy preserving method exists. This is different from thinking that it doesn't exist. | |
| ▲ | 63stack 6 days ago | parent | prev [-] | | While the question of "is it actually possible to do this in a privacy preserving way?" is certainly interesting, was there ever a _single_ occasion where a government had the option of doing something in a privacy preserving way, when a non-privacy preserving way was also possible? Politicians would absolutely kill for the idea of unmasking dissenters on internet forums. Even if the option is a possibility, they are deliberately not going to implement it. | | |
| ▲ | palata 6 days ago | parent [-] | | > was there ever a _single_ occasion I don't know where you live, but in my case, many. Beginning with the fact that I can buy groceries with cash. | | |
| ▲ | 63stack 6 days ago | parent [-] | | Example does not fit, when cash was introduced electronic money transfer was not an option. | | |
| ▲ | palata 6 days ago | parent [-] | | Health insurance being digitalised and encrypted on the insurance card in a decentralised way? Many e-IDs in many countries? | | |
| ▲ | 63stack 5 days ago | parent [-] | | I didn't know about e-IDs in other countries, but in Scandinavia (at least in Norway and Sweden, but I know the same system is used in Denmark as well) they are very much tied to your personal number which uniquely identifies you. Healthcare data is also not encrypted. | | |
| ▲ | palata 5 days ago | parent [-] | | Well the e-ID is an ID, so to the government it's tied to a person. But I know that in multiple countries it's possible to use the e-ID to only share the information necessary with the receiver in a way that the government cannot track. Typically, share only the fact that you are 18+ without sharing your name or birthday, and without the government being able to track where you shared that fact. This is privacy-preserving and modern. |
|
|
|
|
|
|
| |
| ▲ | egil 6 days ago | parent | prev | next [-] | | Fun fact: The Norwegian wine monopoly is rolling out exactly this to prevent scalpers buying up new releases. Each online release will require a signup in advance with a verified account. | |
| ▲ | xenotux 6 days ago | parent | prev | next [-] | | Eh? With the "anonymous" models that we're pushing for right now, nothing stops you from handing over your verification token (or the control of your browser) to a robot for a fee. The token issued by the verifier just says "yep, that's an adult human", not "this is John Doe, living at 123 Main St, Somewhere, USA". If it's burned, you can get a new one. If we move to a model where the token is permanently tied to your identity, there might be an incentive for you not to risk your token being added to a blocklist. But there's no shortage of people who need a bit of extra cash and for whom it's not a bad trade. So there will be a nearly-endless supply of "burner" tokens for use by trolls, scammers, evil crawlers, etc. | | | |
| ▲ | nikau 6 days ago | parent | prev [-] | | Can't wait to start my stolen id as a service for the botnets |
| |
| ▲ | kjkjadksj 6 days ago | parent | prev | next [-] | | Maybe there will be a way to certify humanness. Human testing facility could be a local office you walk over to get your “I am a human” hardware key. Maybe it expires after a week or so to ensure that you are still alive. | | |
| ▲ | palata 6 days ago | parent [-] | | But if that hardware key is privacy-preserving (i.e. websites don't get your identity when you use it), what prevents you from using it for your illegal activity? Scrapers and spam are built by humans, who could get such a hardware key. | | |
| ▲ | kjkjadksj 6 days ago | parent [-] | | You’d at least be limited to deploying a single verified scraper which might be too slow for people to bother with. | | |
| ▲ | palata 5 days ago | parent [-] | | Not even: the government is supposed to provide you with more than one token (how would you verify yourself as a human to more than one website otherwise?) | | |
| ▲ | kjkjadksj 5 days ago | parent [-] | | The idea would be a connection requires the key so you could both verify at more than one website and be limited to one instance per website. | | |
| ▲ | palata 5 days ago | parent [-] | | If you use the same token on more than one website, it's not privacy-preserving anymore. |
|
|
|
|
| |
| ▲ | neumann 6 days ago | parent | prev [-] | | It will be hard to tune them to be just the right level of ignorant and slow as us though! | | |
| ▲ | cwmoore 6 days ago | parent [-] | | Soon enough there will be competing Unicode characters that can remove exclamation points. |
|
|
|
| ▲ | TylerE 6 days ago | parent | prev | next [-] |
| No, it’s exactly because I understand that it bothers me. I understand it will be effective against bots for a few months and best, and legitimate human users will be stuck dealing with the damn thing for years to come. Just like captchas. |
|
| ▲ | ehnto 6 days ago | parent | prev | next [-] |
| It's been going on for decades now too. It's a cat and mouse game that will be with us for as long as people try to exploit online resources with bots. Which will be until the internet is divided into nation nets, suffocated by commercial interests, and we all decide to go play outside instead. |
| |
| ▲ | rob_c 6 days ago | parent [-] | | No. This went into overdrive in the "AI" (crawlers for massive LLM for ML chatbot) era. Frankly it's something I'm sad we don't yet see a lawsuit for similar to the times v OpenAI. A lot of "new crawlers" claim to innocently forget about established standards like robots.txt I just wish people would name and shame the massive companies at the top stomping on the rest of the internet in an edge to "get a step up over the competition". | | |
| ▲ | ehnto 6 days ago | parent | next [-] | | That doesn't really challenge what I said, there's not much "different this time" except the scale is commensurate to the era. Search engine crawlers used to take down websites as well. I understand and agree with what you are saying though, the cat and mouse is not necessarily technical. Part of solving the searchbot issue was also social, with things like robots.txt being a social contract between companies and websites, not a technical one. | |
| ▲ | account42 6 days ago | parent | prev [-] | | Yes, this is not a problem that will be solved with technical measures. Trying to do so is only going to make the web worse for us humans. |
|
|
|
| ▲ | interstice 6 days ago | parent | prev | next [-] |
| The cost benefit calculus for workarounds changes based on popularity. Your custom lock might be easy to break by a professional, but the handful of people who might ever care to pick it are unlikely to be trying that hard. A lock which lets you into 5% of houses however might be worth learning to break. |
|
| ▲ | necovek 6 days ago | parent | prev | next [-] |
| > The point is that they hadn't, and this worked for quite a while. That's what I was hoping to get from the "Numbers" section. I generally don't look up the logs or numbers on my tiny, personal web spaces hosted on my server, and I imagine I could, at some point, become the victim of aggressive crawling (or maybe I have without noticing because I've got an oversized server on a dual link connection). But the numbers actually only show the performance of doing the PoW, not the effect it has had on any site — I am just curious, and I'd love it if someone has done the analysis, ideally grouped by the bot type ("OpenAI bot was responsible for 17% of all requests, this got reduced from 900k requests a day to 0 a day"...). Search, unfortunately, only gives me all the "Anubis is helping fight aggressive crawling" blog articles, nothing with substance (I haven't tried hard, I admit). Edit: from further down the thread there's https://dukespace.lib.duke.edu/server/api/core/bitstreams/81... but no analysis of how many real customers were denied — more data would be even better |
|
| ▲ | topranks 6 days ago | parent | prev | next [-] |
| Sure. It might be a tool in the box. But it’s still cat and mouse. In my place we quickly concluded the scrapers have tons of compute and the “proof-of-work” aspect was meaningless to them. It’s simply the “response from site changed, need to change our scraping code” aspect that helps. |
|
| ▲ | rozab 6 days ago | parent | prev | next [-] |
| >But it takes a human some time and work to tell the crawler HOW. Yes, for these human-based challenges. But this challenge is defined in code. It's not like crawlers don't run JavaScript. It's 2025, they all use headless browsers, not curl. |
|
| ▲ | account42 6 days ago | parent | prev | next [-] |
| If you are going to rely on security through obscurity there are plenty of ways to do that that won't block actual humans because they dare use a non-mainstream browser. You can also do it without displaying cringeworthy art that is only there to get people to pay for the DRM solution you are peddling - that shit has no place in the open source ecosystem. |
| |
| ▲ | ForHackernews 6 days ago | parent [-] | | On the contrary: Making things look silly and unprofessional so that Big Serious Corporations With Money will pay thousands of dollars to whitelabel them is an OUTSTANDING solution for preserving software freedom while raising money for hardworking developers. | | |
| ▲ | account42 6 days ago | parent | next [-] | | I'd rather not raise money for "hardworking" developers if their work is spreading DRM on the web. And it's not just "Big Serious Corporations" that don't want to see your furry art. | | |
| ▲ | ForHackernews 6 days ago | parent [-] | | I'm not commenting on the value of this project (I wouldn't characterize captchas as DRM, but I see why you have that negative connotation) and I tend to agree with the OP that this is simply wasting energy, but the amount of seething over "anime catgirls" makes me want to write all the docs for my next projects in UwU text and charge for a whimsy-free version. (o˘◡˘o) | | |
| ▲ | account42 6 days ago | parent | next [-] | | Please do, it's better if people make their negative personality traits public so that you can avoid them before wasting your time. It will also be useful to show your hypocrisy when you inevitably complain about someone else doing something that you don't like. | | |
| ▲ | boneitis 6 days ago | parent | next [-] | | I don't think you need to try to die on this hill (primarily remarking w.r.t. your lumping in Anubis with Cloudflare/Google/et al. as one). In any case, I'm not appreciating the proliferation of the CAPTCHA-wall any more than you are. The mascot artist wrote in here in another thread about the design philosophies, and they are IMO a lot more honorable in comparison (to BigCo). Besides, it's MIT FOSS. Can't a site operator shoehorn in their own image if they were so inclined? | |
| ▲ | 6 days ago | parent | prev [-] | | [deleted] |
| |
| ▲ | Spivak 6 days ago | parent | prev [-] | | i love this thread because it the Serious Business Man doesn't realize that purposeful unprofessionalism like anime art, silly uwu :3 catgirls, writing with no capitalization are done specifically to be unpalatable to Serious Business Man—threatening to not interact with people like that is the funniest thing. negative signaling works! | | |
| ▲ | SnuffBox 5 days ago | parent [-] | | Acting obnoxiously to piss people off makes you seem like an inexperienced teenager and distances more than "Serious Business Man". I look forward for this to be taken to the logical extreme when a niche subculture of internet nerds change their entire online persona to revolve around scat pornography to spite "the normals", I'm sure they'll be remembered fondly as witty and intelligent and not at all as mentally ill young people. |
|
|
| |
| ▲ | haskellshill 6 days ago | parent | prev [-] | | Sounds like a similar idea to what the "plus N-word license" is trying to accomplish |
|
|
|
| ▲ | dcow 6 days ago | parent | prev | next [-] |
| I deployed a proof of work based auth system once where every single request required hashing a new nonce. Compare with Anubis where only one request a week requires it. The math said doing it that frequently, and with variable argon params the server could tune if it suspected bots, would be impactful enough to deter bots. Would I do that again? Probably not. These days I’d require a weekly mDL or equivalent credential presentation. I have to disagree that an anti-bot measure that only works globally for a few weeks until bots trivially bypass it is effective. In an arms race against bots the bots win. You have to outsmart them by challenging them to do something that only a human can do or is actually prohibitively expensive for bots to do at scale. Anubis doesn't pass that test. And now it’s littered everywhere defunct and useless. |
|
| ▲ | Kwpolska 6 days ago | parent | prev | next [-] |
| With all the SPAs out there, if you want to crawl the entire Web, you need a headless browser running JavaScript. Which will pass Anubis for free. |
|
| ▲ | dwaite 6 days ago | parent | prev | next [-] |
| > As the botmakers circumvent, new methods of proof-of-notbot will be made available. Yes, but the fundamental problem is that the AI crawler does the same amount of work as a legitimate user, not more. So if you design the work such that it takes five seconds on a five year old smartphone, it could inconvenience a large portion of your user base. But once that scheme is understood by the crawler, it will delay the start of their aggressive crawling by... well-under five seconds. An open source javascript challenge as a crawler blocker may work until it gets large enough for crawlers to care, but then they just have an engineer subscribe to changes on GitHub and have new challenge algorithms implemented before the majority of the deployment base migrates. |
|
| ▲ | 6 days ago | parent | prev | next [-] |
| [deleted] |
|
| ▲ | numpad0 6 days ago | parent | prev | next [-] |
| Wasn't there also weird behaviors reported by webadmins across the world, like crawlers used by LLM companies are fetching evergreen data ad nauseum or something along that? I thought the point of adding PoW than just blocking them was to convince them to at least do it right. |
|
| ▲ | casey2 5 days ago | parent | prev | next [-] |
| You don't even need to go there. If the damn thing didn't work the site admin wouldn't have added it and kept it. Sure the program itself is jank in multiple ways but it solves the problem well enough. |
|
| ▲ | raxxorraxor 6 days ago | parent | prev | next [-] |
| Everytime we need to deploy such mechanisms, you reward those that already crawled the data and you penalize newcomers and other honest crawlers. For some sites Anubis might be fitting, but it should be mindfully deployed. |
|
| ▲ | windward 6 days ago | parent | prev | next [-] |
| Many sufficiently technical people take to heart: - Everything is pwned - Security through obscurity is bad Without taking to heart: - What a threat model is And settle on a kind of permanent contrarian nihilist doomerism. Why eat greens? You'll die one day anyway. |
|
| ▲ | colordrops 6 days ago | parent | prev | next [-] |
| On a side note, is the anime girl image customizable? I did a quick Google search an it seems that only the commercial version offers rebranding. |
| |
| ▲ | boomboomsubban 6 days ago | parent [-] | | It's free software. The paid version includes an option to change it, and they ask politely that you don't change it otherwise. |
|
|
| ▲ | TZubiri 6 days ago | parent | prev | next [-] |
| As I understand it, this is Proof of Work, which is strictly not a mouse and cat situation. |
| |
| ▲ | account42 6 days ago | parent [-] | | It is because you are dealing with crawlers that already have a nontrivial cost per page, adding something relatively trivial that is still within the bounds regular users accept won't change the motivations of bad actors at all. | | |
| ▲ | TZubiri 6 days ago | parent [-] | | What is the existing cost per page? as far as I know an http request and some string parsing is somewhat trivial, say 14kb of bandwidth per page? |
|
|
|
| ▲ | wat10000 6 days ago | parent | prev | next [-] |
| Technical people are prone to black-and-white thinking, which makes it hard to understand that making something more difficult will cause people to do it less even though it’s still possible. |
| |
| ▲ | mattnewton 6 days ago | parent [-] | | I think the argument on offer is more, this juice isn't worth the squeeze. Each user is being slowed down and annoyed for something that bots will trivially bypass if they become aware of it. | | |
| ▲ | wat10000 6 days ago | parent [-] | | If they become aware of it and actually think it’s worthwhile. Malicious bots work by scaling, and implementing special cases for every random web site doesn’t scale. And it’s likely they never even notice. | | |
| ▲ | mattnewton 6 days ago | parent [-] | | If this kind of security by not being noticed is the plan, why not just have a trivial (but unique) captcha that asks the user to click a button with no battery wasting computation? | | |
| ▲ | account42 6 days ago | parent | next [-] | | Because you can't sell that as a commercial solution that the open source software ecosystem provides free advertising to. | |
| ▲ | wat10000 6 days ago | parent | prev [-] | | That works too, but not quite as well so it decreases the unwanted activity somewhat less. |
|
|
|
|
|
| ▲ | ramblerman 6 days ago | parent | prev | next [-] |
| Did you read the article? OP doesn't care about bots figuring it out. It's about the compute needed to do the work. It's quite an interesting piece, I feel like you projected something completely different onto it. Your point is valid, but completely adjacent. |
|
| ▲ | tptacek 6 days ago | parent | prev | next [-] |
| Respectfully, I think it's you missing the point here. None of this is to say you shouldn't use Anubis, but Tavis Ormandy is offering a computer science critique of how it purports to function. You don't have to care about computer science in this instance! But you can't dismiss it because it's computer science. Consider: An adaptive password hash like bcrypt or Argon2 uses a work function to apply asymmetric costs to adversaries (attackers who don't know the real password). Both users and attackers have to apply the work function, but the user gets ~constant value for it (they know the password, so to a first approx. they only have to call it once). Attackers have to iterate the function, potentially indefinitely, in the limit obtaining 0 reward for infinite cost. A blockchain cryptocurrency uses a work function principally as a synchronization mechanism. The work function itself doesn't have a meaningfully separate adversary. Everyone obtains the same value (the expected value of attempting to solve the next round of the block commitment puzzle) for each application of the work function. And note in this scenario most of the value returned from the work function goes to a small, centralized group of highly-capitalized specialists. A proof-of-work-based antiabuse system wants to function the way a password hash functions. You want to define an adversary and then find a way to incur asymmetric costs on them, so that the adversary gets minimal value compared to legitimate users. And this is in fact how proof-of-work-based antispam systems function: the value of sending a single spam message is so low that the EV of applying the work function is negative. But here we're talking about a system where legitimate users (human browsers) and scrapers get the same value for every application of the work function. The cost:value ratio is unchanged; it's just that everything is more expensive for everybody. You're getting the worst of both worlds: user-visible costs and a system that favors large centralized well-capitalized clients. There are antiabuse systems that do incur asymmetric costs on automated users. Youtube had (has?) one. Rather than simply attaching a constant extra cost for every request, it instead delivered a VM (through JS) to browsers, and programs for that VM. The VM and its programs were deliberately hard to reverse, and changed regularly. Part of their purpose was to verify, through a bunch of fussy side channels, that they were actually running on real browsers. Every time Youtube changed the VM, the bots had to do large amounts of new reversing work to keep up, but normal users didn't. This is also how the Blu-Ray BD+ system worked. The term of art for these systems is "content protection", which is what I think Anubis actually wants to be, but really isn't (yet?). The problem with "this is good because none of the scrapers even bother to do this POW yet" is that you don't need an annoying POW to get that value! You could just write a mildly complicated Javascript function, or do an automated captcha. |
| |
| ▲ | sugarpimpdorsey 6 days ago | parent | next [-] | | A lot of these passive types of anti-abuse systems rely on the rather bold assumption that making a bot perform a computation is expensive, but isn't for me as an ordinary user. According to whom or what data exactly? AI operators are clearly well-funded operations and the amount of electricity and CPU power is negligible. Software like Anubis and nearly all its identical predecessors grant you access after a single "proof". So you then have free reign to scrape the whole site. The best physical analogy are those shopping cart things where you have to insert a quarter to unlock the cart, and you presumably get it back when you return the cart. The group of people this doesn't affect are the well-funded, a quarter is a small price to pay for leaving your cart in the middle of the parking lot. Those that suffer the most are the ones that can't find a quarter in the cupholder so you're stuck filling your arms with groceries. Would you be richer if they didn't charge you a quarter? (For these anti-bot tools you're paying the electric company, not the site owner.). Maybe. But if you're Scrooge McDuck who is counting? | | |
| ▲ | tptacek 6 days ago | parent | next [-] | | Right, that's the point of the article. If you can tune asymmetric costs on bots/scrapers, it doesn't matter: you can drive bot costs to infinity without doing so for users. But if everyone's on a level playing field, POW is problematic. | |
| ▲ | account42 6 days ago | parent | prev | next [-] | | I like your example because the quarters for shopping cards are not universal everywhere. Some societies have either accepted shopping cart shrinkage as an acceptable cost of doing business or have found better ways to deter it. | |
| ▲ | Almondsetat 6 days ago | parent | prev [-] | | Scrapers are orders of magnitude faster than humans at browsing websites. If the challenge takes 1 second but a human stays on the page for 3 minutes, then it's negligible. But if the challenge takes 1 second and the scraper does ita job in 5 seconds, you already have a 20% slowdown | | |
| ▲ | mewpmewp2 6 days ago | parent | next [-] | | By that logic you could just make your website in general load slower to make scraping harder. | | |
| ▲ | Almondsetat 6 days ago | parent [-] | | No, because in this case there are cookies involved. If the scraper accepts cookies then it's trivial to detect it and block it. If it doesn't, it will have to solve the challenge every single time. |
| |
| ▲ | rfoo 6 days ago | parent | prev | next [-] | | Scrapers do not care about having a 20% slowdown. All they care is being able to scale up. This does not block any scale up attempt. | |
| ▲ | 6 days ago | parent | prev [-] | | [deleted] |
|
| |
| ▲ | xena 6 days ago | parent | prev | next [-] | | For what it's worth, kernel.org seems to be running an old version of Anubis that predates the current challenge generation method. Previously it took information about the user request, hashed it, and then relied on that being idempotent to avoid having to store state. This didn't scale and was prone to issues like in the OP. The modern version of Anubis as of PR https://github.com/TecharoHQ/anubis/pull/749 uses a different flow. Minting a challenge generates state including 64 bytes of random data. This random data is sent to the client and used on the server side in order to validate challenge solutions. The core problem here is that kernel.org isn't upgrading their version of Anubis as it's released. I suspect this means they're also vulnerable to GHSA-jhjj-2g64-px7c. | | |
| ▲ | account42 6 days ago | parent | next [-] | | OP is a real human user trying to make your DRM work with their system. That you consider this to be an "issue" that should be fixed says a lot. | |
| ▲ | tptacek 6 days ago | parent | prev [-] | | Right, I get that. I'm just saying that over the long term, you're going to have to find asymmetric costs to apply to scrapers, or it's not going to work. I'm not criticizing any specific implementation detail of your current system. It's good to have a place to take it! I think that's the valuable observation in this post. Tavis can tell me I'm wrong. :) |
| |
| ▲ | landhar 6 days ago | parent | prev | next [-] | | > But here we're talking about a system where legitimate users (human browsers) and scrapers get the same value for every application of the work function. The cost:value ratio is unchanged; it's just that everything is more expensive for everybody. You're getting the worst of both worlds: user-visible costs and a system that favors large centralized well-capitalized clients. Based on my own experience fighting these AI scrappers, I feel that the way they are actually implemented makes it that in practice there is asymmetry in the work scrappers have to do vs humans. The pattern these scrappers follow is that they are highly distributed. I’ll see a given {ip, UA} pair make a request to /foo immediately followed by _hundreds_ of requests from completely different {ip, UA} pairs to all the links from that page (ie: /foo/a, /foo/b, /foo/c, etc..). This is a big part of what makes these AI crawlers such a challenge for us admins. There isn’t a whole lot we can do to apply regular rate limiting techniques: the IPs are always changing and are no longer limited to corporate ASN (I’m now seeing IPs belonging to consumer ISPs and even cell phone companies), and the User Agents all look genuine. But when looking through the logs you can see the pattern that all these unrelated requests are actually working together to perform a BFS traversal of your site. Given this pattern, I believe that’s what makes the Anubis approach actually work well in practice. For a given user, they will encounter the challenge once when accessing the site the first time, then they’ll be able to navigate through it without incurring any cost. While the AI scrappers would need to solve the challenge for every single one of their “nodes” (or whatever it is they would call their {ip, UA} pairs). From a site reliability perspective, I don’t even care if the crawlers manage to solve the challenge or not. That it manages to slow them down enough to rate limit them as a network is enough. To be clear: I don’t disagree with you that the cost incurred by regular human users is still high. But I don’t think it’s fair to say that this is not a situation in which the cost to the adversary is not asymmetrical. It wouldn’t be if the AI crawlers hadn’t converged towards an implementation that behaves as a DDOS botnet. | |
| ▲ | akoboldfrying 6 days ago | parent | prev | next [-] | | The (almost only?) distinguishing factor between genuine users and bots is the total volume of requests, but this can still be used for asymmetric costs. If botPain > botPainThreshold and humanPain < humanPainThreshold then Anubis is working as intended. A key point is that those inequalities look different at the next level of detail. A very rough model might be: botPain = nBotRequests * cpuWorkPerRequest * dollarsPerCpuSecond humanPain = c_1 * max(elapsedTimePerRequest) + c_2 * avg(elapsedTimePerRequest) The article points out that the botPain Anubis currently generates is unfortunately much too low to hit any realistic threshold. But if the cost model I've suggested above is in any way realistic, then useful improvements would include: 1. More frequent but less taxing computation demands (this assumes c_1 >> c_2) 2. Parallel computation (this improves the human experience with no effect for bots) ETA: Concretely, regarding (1), I would tolerate 500ms lag on every page load (meaning forget about the 7-day cookie), and wouldn't notice 250ms. | | |
| ▲ | tptacek 6 days ago | parent [-] | | That's exactly what I'm saying isn't happening: the user pays some cost C per article, and the bot pays exactly the same cost C. Both obtain the same reward. That's not how Hashcash works. | | |
| ▲ | akoboldfrying 6 days ago | parent [-] | | I'm saying your notion of "the same cost" is off. They pay the same total CPU cost, but that isn't the actual perceived cost in each case. | | |
| ▲ | tptacek 6 days ago | parent [-] | | Can you flesh that out more? In the case of AI scrapers it seems especially clear: the model companies just want tokens, and are paying a (one-time) cost of C for N tokens. Again, with Hashcash, this isn't how it works: most outbound spam messages are worthless. The point of the system is to exploit the negative exponent on the attacker's value function. | | |
| ▲ | remexre 6 days ago | parent | next [-] | | The scraper breaking every time a new version of Anubis is deployed, until new anti-Anubis features are implemented, is the point; if the scrapers were well-engineered by a team that cared about the individual sites they're scraping, they probably wouldn't be so pathological towards forges. The human-labor cost of working around Anubis is unlikely to be paid unless it affects enough data to be worth dedicating time to, and the data they're trying to scrape can typically be obtained "respectfully" in those cases -- instead of hitting the git blame route on every file of every commit of every repo, just clone the repos and run it locally, etc. | | |
| ▲ | tptacek 6 days ago | parent [-] | | Sure, but if that's the case, you don't need the POW, which is what bugs people about this design. I'm not objecting to the idea of anti-bot content protection on websites. |
| |
| ▲ | akoboldfrying 6 days ago | parent | prev [-] | | Perhaps I caused confusion by writing "If botPain > botPainThreshold and humanPain < humanPainThreshold then Anubis is working as intended", as I'm not actually disputing that Anubis is currently ineffective against bots. (The article makes that point and I agree with it.) I'm arguing against what I take to be your stronger claim, namely that no "Anubis-like" countermeasure (meaning no countermeasure that charges each request the same amount of CPU in expectation) can work. I claim that the cost for the two classes of user are meaningfully different: bots care exclusively about the total CPU usage, while humans care about some subjective combination of average and worst-case elapsed times on page loads. Because the sheer number of requests done by bots is so much higher, there's an opportunity to hurt them disproportionately according to their cost model by tweaking Anubis to increase the frequency of checks but decrease each check's elapsed time below the threshold of human annoyance. |
|
|
|
| |
| ▲ | seba_dos1 6 days ago | parent | prev | next [-] | | > The term of art for these systems is "content protection", which is what I think Anubis actually wants to be, but really isn't (yet?). No, that's missing the point. Anubis is effectively a DDoS protection system, all the talking about AI bots comes from the fact that the latest wave of DDoS attacks was initiated by AI scrapers, whether intentionally or not. If these bots would clone git repos instead of unleashing the hordes of dumbest bots on Earth pretending to be thousands and thousands of users browsing through git blame web UI, there would be no need for Anubis. | | |
| ▲ | tptacek 6 days ago | parent [-] | | I'm not moralizing, I'm talking about whether it can work. If it's your site, you don't need to justify putting anything in front of it. | | |
| ▲ | seba_dos1 6 days ago | parent [-] | | Did you accidentally reply to a wrong comment? (not trying to be snarky, just confused) The only "justification" there would be is that it keeps the server online that struggled under load before deploying it. That's the whole reason why major FLOSS projects and code forges have deployed Anubis. Nobody cares about bots downloading FLOSS code or kernel mailing lists archives; they care about keeping their infrastructure running and whether it's being DDoSed or not. | | |
| ▲ | tptacek 6 days ago | parent [-] | | I just said you didn't have to justify it. I don't care why you run it. Run whatever you want. The point of the post is that regardless of your reasons for running it, it's unlikely to work in the long run. | | |
| ▲ | seba_dos1 6 days ago | parent [-] | | And what I said is that all these most visible deployments of Anubis did not deploy it to be a content protection system of any kind, so it doesn't have to work this way at all for them. As long as the server doesn't struggle with load anymore after deploying Anubis, it's a win - and it works so far. (and frankly, it likely will only need to work until the bubble bursts, making "the long run" irrelevant) | | |
| ▲ | rfoo 6 days ago | parent [-] | | > and frankly, it likely will only need to work until the bubble bursts, making "the long run" irrelevant Now I get why people are so weirdly being dismissive about the whole thing. Good luck, it's not going to "burst" any time soon. Or rather, a "burst" would not change the world in the direction you want it to be. | | |
| ▲ | seba_dos1 6 days ago | parent [-] | | Not exactly sure what you're talking about. The problem is caused by tons of shitty companies cutting corners to collect training data as fast as possible, fueled by easy money that you get by putting "AI" somewhere in your company's name. As soon as the investment boom is over, this will be largely gone. LLMs will continue to be trained and data will continue to be scraped, but that alone isn't the problem. Search engine crawlers somehow manage not to DDoS the servers they pull the data from, competent AI scrapers can do the same. In fact, a competent AI scraper wouldn't even be stopped by Anubis as it is right now at all, and yet Anubis works pretty well in practice. Go figure. |
|
|
|
|
|
| |
| ▲ | account42 6 days ago | parent | prev [-] | | > There are antiabuse systems that do incur asymmetric costs on automated users. Youtube had (has?) one. Rather than simply attaching a constant extra cost for every request, it instead delivered a VM (through JS) to browsers, and programs for that VM. The VM and its programs were deliberately hard to reverse, and changed regularly. Part of their purpose was to verify, through a bunch of fussy side channels, that they were actually running on real browsers. Every time Youtube changed the VM, the bots had to do large amounts of new reversing work to keep up, but normal users didn't. That depends on what you count as normal users though. Users that want to use alternative players also have to deal with this and since yt-dlp and youtube-dl before have been able to provide a solution for those user and bots can just do the same I'm not sure if I'd call the scheme successful in any way. |
|
|
| ▲ | raverbashing 6 days ago | parent | prev | next [-] |
| [flagged] |
|
| ▲ | odo1242 6 days ago | parent | prev | next [-] |
| Also, it forces the crawler to gain code execution capabilities, which for many companies will just make them give up and scrape someone else. |
| |
| ▲ | wredcoll 6 days ago | parent [-] | | I don't know if you've noticed, but there's a few websites these days that use javascript as part of their display logic. | | |
| ▲ | odo1242 5 days ago | parent [-] | | Yes, and those sites take way more effort to crawl than other sites. They may still get crawled, but likely less often than the ones that don't use JavaScript for rendering (which is the main purpose of Anubis - saving bandwidth from crawlers who crawl sites way too often). (Also, note the difference between using JavaScript for display logic and requiring JavaScript to load any content at all. Most websites do the first, the second isn't quite as common.) |
|
|
|
| ▲ | sneak 6 days ago | parent | prev [-] |
| The fundamental failure of this is that you can’t publish data to the web and not publish data to the web. If you make things public, the public will use it. It’s ineffective. (And furry sex-subculture propaganda pushed by its author, which is out of place in such software.) |
| |
| ▲ | pferde 6 days ago | parent | next [-] | | The misguided parenthetical aside, this is not about resources being public, this is about bad actors accessing those resources in a highly inefficient and resource-intensive manner, effectively DDOS-ing the source. | |
| ▲ | sznio 6 days ago | parent | prev [-] | | >And furry sex-subculture propaganda pushed by its author if your first thought when seeing a catgirl is sex, i got bad news for you | | |
|