Remix.run Logo
Arnavion 6 days ago

>This dance to get access is just a minor annoyance for me, but I question how it proves I’m not a bot. These steps can be trivially and cheaply automated.

>I think the end result is just an internet resource I need is a little harder to access, and we have to waste a small amount of energy.

No need to mimic the actual challenge process. Just change your user agent to not have "Mozilla" in it; Anubis only serves you the challenge if it has that. For myself I just made a sideloaded browser extension to override the UA header for the handful of websites I visit that use Anubis, including those two kernel.org domains.

(Why do I do it? For most of them I don't enable JS or cookies for so the challenge wouldn't pass anyway. For the ones that I do enable JS or cookies for, various self-hosted gitlab instances, I don't consent to my electricity being used for this any more than if it was mining Monero or something.)

johnecheck 6 days ago | parent | next [-]

Sadly, touching the user-agent header more or less instantly makes you uniquely identifiable.

Browser fingerprinting works best against people with unique headers. There's probably millions of people using an untouched safari on iPhone. Once you touch your user-agent header, you're likely the only person in the world with that fingerprint.

sillywabbit 6 days ago | parent | next [-]

If someone's out to uniquely identify your activity on the internet, your User-Agent string is going to be the least of your problems.

_def 6 days ago | parent [-]

Not sure what you mean, as exactly this is happening currently on 99% of the web. Brought to you by: ads

MathMonkeyMan 6 days ago | parent | next [-]

If you're browsing with a browser, then there are 1000 ways to identify you. If you're browsing without a browser, then there is at least one way to identify you.

amusingimpala75 6 days ago | parent | prev | next [-]

I think what they meant is: there’s already so many other ways to fingerprint (say, canvas) that a common user agent doesn’t significantly help you

johnecheck 5 days ago | parent [-]

'There's so many cliffs around that not jumping off that one barely helps you'.

I meeeeeannn... sure? I know that browser fingerprinting works quite well without, but custom headers are actually a game over in terms of not getting tracked.

6 days ago | parent | prev [-]
[deleted]
Arnavion 6 days ago | parent | prev | next [-]

UA fingerprinting isn't a problem for me. As I said I only modify the UA for the handful of sites that use Anubis that I visit. I trust those sites enough that them fingerprinting me is unlikely, and won't be a problem even if they did.

NoMoreNicksLeft 6 days ago | parent | prev | next [-]

I'll set mine to "null" if the rest of you will set yours...

gabeio 6 days ago | parent [-]

The string “null” or actually null? I have recently seen a huge amount of bot traffic which has actually no UA and just outright block it. It’s almost entirely (microsoft cloud) Azure script attacks.

NoMoreNicksLeft 6 days ago | parent [-]

I was thinking the string "null". But if you have a better idea.

account42 6 days ago | parent [-]

User-Agent: '; DROP TABLE blocked_bots;

codedokode 6 days ago | parent | prev | next [-]

If your headers are new every time then it is very difficult to figure out who is who.

spoaceman7777 6 days ago | parent | next [-]

yes, but it puts you in the incredibly small bucket of "users that has weird headers that don't mesh well", and makes using the rest of the (many) other fingerprinting techniques all the more accurate.

JoshTriplett 6 days ago | parent | prev | next [-]

> If your headers are new every time then it is very difficult to figure out who is who.

https://xkcd.com/1105/

kelseydh 6 days ago | parent | prev | next [-]

It is very easy unless the IP address is also switching up.

heavyset_go 6 days ago | parent | prev [-]

It's very easy to train a model to identify anomalies like that.

johnecheck 5 days ago | parent [-]

While it's definitely possible to train a model for that, 'very easy' is nonsense.

Unless you've got some superintelligence hidden somewhere, you'd choose a neural net. To train, you need a large supply of LABELED data. Seems like a challenge to build that dataset; after all, we have no scalable method for classifying as of yet.

andrewmcwatters 6 days ago | parent | prev | next [-]

Yes, but you can take the bet, and win more often than not, that your adversary is most likely not tracking visitor probabilities if you can detect that they aren't using a major fingerprinting provider.

6 days ago | parent | prev | next [-]
[deleted]
jagged-chisel 6 days ago | parent | prev | next [-]

I wouldn’t think the intention is to s/Mozilla// but to select another well-known UA string.

Arnavion 6 days ago | parent | next [-]

The string I use in my extension is "anubis is crap". I took it from a different FF extension that had been posted in a /g/ thread about Anubis, which is where I got the idea from in the first place. I don't use other people's extensions if I can help it (because of the obvious risk), but I figured I'd use the same string in my own extension so as to be combined with users of that extension for the sake of user-agent statistics.

CursedSilicon 6 days ago | parent [-]

It's a bit telling that you "don't use extensions if you can help it" but trust advice from a 4chan board

Arnavion 6 days ago | parent | next [-]

It's also a bit telling that you read the phrase "I took it from a different FF extension that had been posted" and interpreted it as taking advice instead of reading source code.

6 days ago | parent | prev | next [-]
[deleted]
account42 6 days ago | parent | prev | next [-]

It's telling that he understands the difference between taking something he can't fully verify and taking simple hints that improve his understanding?

username135 6 days ago | parent | prev [-]

4chan, the worlds greatest hacker

soulofmischief 6 days ago | parent | prev | next [-]

The UA will be compared to other data points such as screen resolution, fonts, plugins, etc. which means that you are definitely more identifiable if you change just the UA vs changing your entire browser or operating system.

throwawayffffas 6 days ago | parent | prev [-]

I don't think there are any.

Because servers would serve different content based on user agent virtually all browsers start with Mozilla/5.0...

extraduder_ire 6 days ago | parent [-]

curl, wget, lynx, and elinks all don't by default (I checked). Mainstream web browsers likely all do, and will forever.

userbinator 6 days ago | parent [-]

Anubis will let curl through, while blocking any non-mainstream browser which will likely say "Mozilla" in its UA just for best compatibility and call that a "bot"? WTF.

6 days ago | parent | prev [-]
[deleted]
Animats 6 days ago | parent | prev | next [-]

> (Why do I do it? For most of them I don't enable JS so the challenge wouldn't pass anyway. For the ones that I do enable JS for, various self-hosted gitlab instances, I don't consent to my electricity being used for this any more than if it was mining Monero or something.)

Hm. If your site is "sticky", can it mine Monero or something in the background?

We need a browser warning: "This site is using your computer heavily in a background task. Do you want to stop that?"

mikestew 6 days ago | parent [-]

We need a browser warning: "This site is using your computer heavily in a background task. Do you want to stop that?"

Doesn't Safari sort of already do that? "This tab is using significant power", or summat? I know I've seen that message, I just don't have a good repro.

qualeed 6 days ago | parent [-]

Edge does, as well. It drops a warning in the middle of the screen, displays the resource-hogging tab, and asks whether you want to force-close the tab or wait.

zahlman 6 days ago | parent | prev | next [-]

> Just change your user agent to not have "Mozilla" in it. Anubis only serves you the challenge if you have that.

Won't that break many other things? My understanding was that basically everyone's user-agent string nowadays is packed with a full suite of standard lies.

Arnavion 6 days ago | parent | next [-]

It doesn't break the two kernel.org domains that the article is about, nor any of the others I use. At least not in a way that I noticed.

throwawayffffas 6 days ago | parent | prev [-]

In 2025 I think most of the web has moved on from checking user strings. Your bank might still do it but they won't be running Anubis.

Aachen 6 days ago | parent | next [-]

Nope, they're on cloudflare so that all my banking traffic can be intercepted by a foreign company I have no relation to. The web is really headed in a great direction :)

account42 6 days ago | parent | prev [-]

The web as a whole definitely has not moved on from that.

msephton 6 days ago | parent | prev | next [-]

I'm interested in your extension. I'm wondering if I could do something similar to force text encoding of pages into Japanese.

Arnavion 5 days ago | parent [-]

If your Firefox supports sideloading extensions then making extensions that modify request or response headers is easy.

All the API is documented in https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/Web... . My Anubis extension modifies request headers using `browser.webRequest.onBeforeSendHeaders.addListener()` . Your case sounds like modifying response headers which is `browser.webRequest.onHeadersReceived.addListener()` . Either way the API is all documented there, as is the `manifest.json` that you'll need to write to register this JS code as a background script and whatever permissions you need.

Then zip the manifest and the script together, rename the zip file to "<id_in_manifest>.xpi", place it in the sideloaded extensions directory (depends on distro, eg /usr/lib/firefox/browser/extensions), restart firefox and it should show up. If you need to debug it, you can use the about:debugging#/runtime/this-firefox page to launch a devtools window connected to the background script.

msephton 5 days ago | parent [-]

Cheers! I'm in Safari so I'll see if there's a match.

semiquaver 6 days ago | parent | prev | next [-]

Doesn’t that just mean the AI bots can do the same? So what’s the point?

danieltanfh95 6 days ago | parent | prev | next [-]

wtf? how is this then better than a captcha or something similar?!

throw84a747b4 6 days ago | parent | prev [-]

[flagged]

gruez 6 days ago | parent | next [-]

>Not only is Anubis a poorly thought out solution from an AI sympathizer [...]

But the project description describes it as a project to stop AI crawlers?

> Weighs the soul of incoming HTTP requests to stop AI crawlers

throw84a747b4 6 days ago | parent | next [-]

Why would a company that wants to stop AI crawlers give talks on LLMs and diffusion models at AI conferences?

Why would they use AI art for the first Anubis mascot until GitHub users called out the hypocrisy on the issue tracker?

Why would they use Stable Diffusion art in their blogposts until Mastodon and Bluesky users called them out on it?

cyanydeez 6 days ago | parent | next [-]

Likely the only way to stop AI is with purpose, fundamentally sound "machine learning", aka, AI.

AI slop is mass produced, but there's likely great potential for really useful AI models with very limited scopes.

s1mplicissimus 6 days ago | parent | prev | next [-]

[flagged]

johnnyanmac 6 days ago | parent [-]

How so? He brings good points and now there's two responses trying to make emotional appeal instead of address the hypocrisy.

I know it's 2025, but I expected a bit better from this community.

Dylan16807 6 days ago | parent [-]

Are you arguing that "poorly thought out solution from an AI sympathizer, it was probably vibecoded" is not an emotional appeal, or do you think it's unfair to respond to an emotional appeal with an emotional criticism?

johnnyanmac 6 days ago | parent [-]

I think their layer points based on actual events do lend credence to at least look deeper. The incentives don't line up for a tool to combat AI to utilize it for promotional material.

Dylan16807 6 days ago | parent [-]

Well it's a tool to combat scraping, and in particular text-focused scraping. That's very far away from image generation, even with both falling under "AI".

It's worth some consideration but it doesn't leave the whole thing feeling nonsensical or fake.

Imustaskforhelp 6 days ago | parent | prev [-]

I am not again AI art completely since I think of it as an editing instead of art itself. My thoughts on AI art are nuanced and worth discussing some other day, lets talk about the author of anubis/story of anubis

So, I hope you know the entire story behind Anubis, firstly they were hosting their own git server (I think?) and amazon's ai related department was basically ddosing their server in some sense by trying to scrape it and they created anubis in a way to prevent that.

The idea isn't that new, it is just proof of work and they created it firstly for their own use and I think that they are An AI researcher/ related to AI, so for them using AI pics wasn't that big of a deal and pretty sure that they had some reason behind it and even that has been changed.

Stop whining about free projects/labour man. The same people comment oh well these AI scrapers are scraping so many websites and taking livelihood of website makers and now you have someone who just gave it to ya for free and you are nitpicking the wrong things.

You can just fork it without the anime images or without the AI thing if you don't align with them and their philosophy.

Man now I feel the mandela effect as I read it somewhere on their blog or any thing that they themselves feel the hypocrisy or something along that (pardon me if I am wrong, I usually am) But they themselves (I think?) would like to get rid of working in the AI industry while making anti AI scraper but they might need more donations iirc and they themselves know the hypocrisy.

johnnyanmac 6 days ago | parent | next [-]

> My thoughts on AI art are nuanced and worth discussing some other day

If the argument is all this AI support for an anti-AI crawler, I don't think it's a good argument to ignore it.

and as long as the 3C's aren't followed, there's no subtly here. You don't get to use "edited" art in a commercial product except in fair use cases.

>Stop whining about free projects/labour man.

Yes, I'm sure this sort of argument is coming from someone with very subtle and nuanced thoughts. "Free" ruined the internet very quickly when corporations figured out that's all they need for entry. They can monetize behind the scenes or after market capture.

Imustaskforhelp 6 days ago | parent [-]

Okay I understand your point and I agree with you. Maybe, my wording was a little harsh and I understand your point.

lets see, the thing that I am trying to say is that, in my opinion even with AI art, that is not the point.

The point remains whether anubis is actually useful at what it does and is it worth it.

Also I understand free ruined the internet very quickly, but this is free and open source. Open source has made the internet better a 100 times imo.

I genuinely don't think that the author had any wrong intentions with using AI art.

and since we have opened the box, might as well, talk about the AI thoughts.

Basically, I think that there are two schools of thoughts towards AI/Ai art. Its efficacy at doing what it is advertised to say, its effect on markets, its effect on climate.

The efficacy at doing what it can do : I absolutely agree that there is a lot of hype around it but AI art even open source models have become good enough for some very "basic tasks" imo, Like I had a discord server and I actually used AI art as the logo because people wanted a logo and I am not an artist and I can't comission anyone just for something so small. I immediately told everyone this and nobody seemed to care and afterwards and after some more people joined,literally noone asked if it was ai generated or not. I was expecting a single person to notice but nope.

Another point is about climate, I think that AI itself is pretty efficient, there were posts about how it compares to a lightbulb but the problem is how massively it is used at scale and how much the demand can change and how it can affect the power demand of that datacentre and if power demand changes, then that requires the power generators to tweak their speed and tldr, in that process, it becomes really inefficient but still I am actually surprised that we are calling out on AI when there is bitcoin which literally all it does with its pow is waste energy and there are other cryptocoins which can literally have close to 0 fees and instant but nope the crypto market is in a bubble. Stablecoins are the only good thing to come out of it.

Now that being said, its effect on job markets. I am not an artist but I imagine it is frustating to see AI replicate art. but the point is, nobody wants AI art!! There is no economic incentive to make AI art unless genuinely can't comission someone (which is what I did in my server) or just are using it as a backup art untill you can comission one (which is what anubis did) but in my opinion, if someone can't already pay like I couldn't in my discord server, I would have just build one myself or used some CC:0 art with full credits .

Which is why I don't consider AI art "art". Its a backup, more like editing. The people will throw wrenches at you if you get caught using it. And that's a good thing. but we do need to realize maybe using ai art as a backup and people should definitely push against ai art so that if possible, real artists gets the job done.

I don't consider ai art to be anymore of a gimmick but even then, I mean I can understand if people are using ai art as backups untill they can comission real art or build one themselves.

shkkmo 6 days ago | parent | prev [-]

> Stop whining about free projects/labour man. The same people comment oh well these AI scrapers are scraping so many websites and taking livelihood of website makers and now you have someone who just gave it to ya for free and you are nitpicking the wrong things.

That isn't the issue. The issue is that this tool is not fit for purpose and is inappropriate to be used by the projects that have adopted it.

The proof of work scheme is idiotic. As explained in the article, it's super easy to mine enough tokens to bypass for any bad actors, while it interfers and wastes the time of good actors.

It's almost like the author deliberately designed a tool that only looks like it is doing something while actually trivially allowing the very thing it was supposedly built to prevent.

Imustaskforhelp 6 days ago | parent [-]

Hm yea this is a fair critisicm actually as I also said in some other comment just now that we need to discuss more about if anubis is actually being useful or not at what its saying.

You raise a good point man, what do you suggest should be done instead of what anubis is doing right now for the same outcome(getting not effectively ddosed by AI scrapers) ?

shkkmo 5 days ago | parent [-]

The undelying point is mentioned early in the article:

> The traditional solution to blocking nuisance crawlers is to use a combination of rate limiting and CAPTCHAs. The CAPTCHA forces vistors to solve a problem designed to be very difficult for computers but trivial for humans. This isn’t perfect of course, we can debate the accessibility tradeoffs and weaknesses, but conceptually the idea makes some sense.

> Anubis – confusingly – inverts this idea. It insists visitors solve a problem trivial for computers, but impossible for humans.

Fundementally, the idea that PoW is a good way to tell humans from bots just doesn't work.

Captchas, rate limiting, authentication, etc are all part of the solution.

The more bespoke a captcha solution is, the less likely that bots, especially the kind of low effort bots that ignore rate limits and hammer sites, will have the ability to break it.

Arguably, anubis has a much better harm-vs-protection ratio at much lower difficulty setting where it functions less as a PoE system and more as an obscure way to block lowe effort bots. Of course, the more it gets adopted, the less well this will work.

account42 6 days ago | parent | prev [-]

AI companies are just as interested in stopping competing crawlers as anyone else.

6 days ago | parent | prev [-]
[deleted]