Remix.run Logo
postalcoder 14 hours ago

There may actually be some utility here. LLM agents refuse to traverse the links. Tested with gemini-3-pro, gpt-5.2, and opus 4.5.

edit: gpt-oss 20B & 120B both eagerly visit it.

devsda 12 hours ago | parent | next [-]

I wish this came a day earlier.

There is a current "show your personal site" post on top of HN [1] with 1500+ comments. I wonder how many of those sites are or will be hammered by AI bots in the next few days to steal/scrape content.

If this can be used as a temporary guard against AI bots, that would have been a good opportunity to test it out.

1. https://news.ycombinator.com/item?id=46618714

aflukasz 6 hours ago | parent | next [-]

AI bots (or clients claiming to be one) appear quite fast on new sites, at least that's what I saw recently in few places. They probably monitor Certificate Transparency logs - you won't hide by avoiding linking. Unless you are ok with staying in the shadow of naked http.

KetoManx64 4 hours ago | parent [-]

Get a wildcard cert and use it behind a reverse proxy.

RIMR 4 hours ago | parent [-]

Okay, but then what? Host your sites on something other than 'www' or '*', exclude them from search engines, and never link to them? Then, the few people who do resolve these subdomains, you just gotta hope they don't do it using a DNS server owned by a company with an AI product (like Google, Microsoft, or Amazon)?

I really don't know how you're supposed to shield your content from AI without also shielding it from humanity.

xlii 12 hours ago | parent | prev | next [-]

I posted my site on the thread.

My site is hosted on Cloudflare and I trust its protection way more than flavor of the month method. This probably won't be patched anytime soon but I'd rather have some people click my link and not just avoid it along with AI because it looks fishy :)

treebeard901 11 hours ago | parent | next [-]

I've been considering how feasible it would be to build a modern form of the denial of service low orbit ion cannon by having various LLMs hammer sites until they break. I'm sure anything important already has Cloudflare style DDOS mitigation so maybe it's not as effective. Still, I think it's only a matter of time before someone figures it out.

There have been several amplification attacks using various protocols for DDOS too...

devsda 8 hours ago | parent | prev | next [-]

Yeah I meant using it as an experiment to test with two different links(or domains) and not as a solution to evade bot traffic.

Still, I think it would be interesting to know if anybody noticed a visible spike in bot traffic(especially AI) after sharing their site info in that thread.

bookofjoe 4 hours ago | parent [-]

I didn't: no traffic before sharing, none since.

pawelduda 4 hours ago | parent | prev [-]

FYI Cloudflare protection doesn't mean much nowadays if someone is slightly determined to scrape the site

Unless you mean DDoS protection, this one helps for sure

testfrequency 11 hours ago | parent | prev | next [-]

Glad I’m not the only one who felt icky seeing that post.

I agree my tinfoil hat signal told me this was the perfect way to ask people for bespoke, hand crafted content - which of course AI will love to slurp up to keep feeding the bear.

Dilettante_ 42 minutes ago | parent [-]

Not producing or publishing creative works out of fear that someone will find them and build on top of them is such a strange position to me, especially on a site that has it's cultural basis in hacker culture.

kzalesak 7 hours ago | parent | prev | next [-]

I think that something specifically intended for this, like Anubis, is a much better option.

subscribed 7 hours ago | parent [-]

Anubis flatly refuses me access to several websites when I'm accessing them with a normal Chromium with enabled JS and whatnot, from a mainstream, typical OS, just with aggressive anti-tracking settings.

Not sure if that's the intended use case. At least Cloudflare politely masks for CAPTCHA.

jnrk 12 hours ago | parent | prev | next [-]

Of course, the downside is that people might not even see your site at all because they’re afraid to click on that suspicious link.

postalcoder 11 hours ago | parent [-]

Site should add a reverse lookup. Provide the poison and antidote.

briandear 8 hours ago | parent | prev [-]

How is AI viewing content any different from Google? I don’t even use Google anymore because it’s so filled with SEO trash as to be useless for many things.

Zambyte 6 hours ago | parent [-]

Try hosting a cgit server on a 1u server in your bedroom and you'll see why.

PUSH_AX 10 hours ago | parent | prev | next [-]

LLM led scraping might not as it requires an LLM to make a choice to kick it off, but crawling for the purpose of training data is unlikely to be affected.

Barathkanna 12 hours ago | parent | prev [-]

Sounds like a useful signal for people building custom agents or models. Being able to control whether automated systems follow a link via metadata is an interesting lever, especially given how inconsistent current model heuristics are.