Remix.run Logo
VladVladikoff a day ago

Wait a sec… if the TLS handshakes look different, would it be possible to have an nginx level filter for traffic that claims to be a web browser (eg chrome user agent), yet really is a python/php script? Because this would account for the vast majority of malicious bot traffic, and I would love to just block it.

aaron42net a day ago | parent | next [-]

Cloudflare uses JA3 and now JA4 TLS fingerprints, which are hashes of various TLS handshake parameters. https://github.com/FoxIO-LLC/ja4/blob/main/technical_details... has more details on how that works, and they do offer an Nginx module: https://github.com/FoxIO-LLC/ja4-nginx-module

gruez a day ago | parent | prev | next [-]

That's basically what security vendors like cloudflare does, except with even more fingerprinting, like a javascript challenge that checks the js interpreter/DOM.

walrus01 a day ago | parent [-]

JS to check user agent things like screen window dimensions as well, which legit browsers will have and bots will also present but with a more uniform and predictable set of x and y dimensions per set of source IPs. Lots of possibilities for js endpoint fingerprinting.

Fripplebubby 8 hours ago | parent [-]

I also present a uniform and predictable set of x and y dimensions per source IPs as a human user who maximizes my browser window

gruez 7 hours ago | parent [-]

Maximizing reduces the variations, but there's still quite a bit of variation because of different display resolution + scaling settings + OS configuration (eg. short or tall taskbars).

walrus01 7 hours ago | parent [-]

Or settings like auto-hide MacOS dock vs not auto hide, affecting the vertical size of the browser window.

jrochkind1 a day ago | parent | prev | next [-]

Well, I think that's what OP is meant to avoid you doing, exactly.

immibis a day ago | parent | prev [-]

Yes, and sites are doing this and it absolutely sucks because it's not reliable and blocks everyone who isn't using the latest Chrome on the latest Windows. Please don't whitelist TLS fingerprints unless you're actually under attack right now.

fc417fc802 a day ago | parent | next [-]

If you're going to whitelist (or block at all really) please simply redirect all rejected connections to a proof of work scheme. At least that way things continue to work with only mild inconvenience.

jrochkind1 10 hours ago | parent [-]

I am very curious if the current wave of mystery distributed (AI?) bots will just run javascript and be able to get past proof of work too....

Based on the fact that they are requesting the same absolutely useless and duplicative pages (like every possible combniation of query params even if it does not lead to unique content) from me hundreds of times per url, and are able to distribute so much that I'm only getting 1-5 requests per day from each IP...

...cost does not seem to be a concern for them? Maybe they won't actually mind ~5 seconds of CPU on a proof of work either? They are really a mystery to me.

I currently am using CloudFlare Turnstile, which incorporates proof of work but also various other signals, which is working, but I know does have false positives. I am working on implementing a simpler nothing but JS proof of work (SHA-512-based), and am going to switch that in and if it works great (becuase I don't want to keep out the false positives!), but if it doesn't, back to Turnstile.

The mystery distributred idiot bots were too much. (Scaling up resources -- they just scaled up their bot rates too!!!) I don't mind people scraping if they do it respectfully and reasonably; taht's not what's been going on, and it's an internet-wide phenomenon of the past year.

RKFADU_UOFCCLEL 9 hours ago | parent | prev [-]

Blocking a hacking attack is not even a thing, they just change IP address each time they learn a new fact about how your system works and progress smoothly without interruption until they exfiltrate your data. Same goes for scrapers the only difference being there is no vulnerability to fix that will stop them.