Remix.run Logo
faangguyindia 8 hours ago

Yesterday I logged into cloudflare and found that Cloudflare had blocked chatgpt and claude from accessing my site. https://macrocodex.app

This is bad because there are fitness guides on my domain

https://macrocodex.app/guides which newbies often put in chatgpt and asks to simplify.

I enabled crawl for LLMs. There is lot of misinformation in fitness field so it's better if LLMs get their content from people who atleast have experience in the field

robhoeijmakers 7 hours ago | parent [-]

It is good to make a proper distinction, in the ChatGPT context, between crawlers and agents. The crawlers go for the content to build a new model, the agents serve content to users. The last one can be very useful.

tardedmeme 7 hours ago | parent [-]

They use different user-agent strings. The crawlers obfuscate themselves and use residential proxies. The agents call themselves ChatGPT-User. Of course Cloudflare wants OpenAI to pay them for not blocking ChatGPT-User by default.

faangguyindia 7 hours ago | parent [-]

It's true, crawlers used for AI training don't say they are crawlers at all.