| ▲ | I prompted ChatGPT, Claude, Perplexity, and Gemini and watched my Nginx logs(surfacedby.com) | |||||||||||||||||||||||||||||||||||||||||||||||||
| 120 points by startages 5 hours ago | 22 comments | ||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | lambda 4 hours ago | parent | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||
Gah, the writing on this is so painful to read, it feels like this was most likely written by an LLM. The writing style is so unclear, it's hard to figure out one of the key points: it mentions that Gemini doesn't use a distinct user-agent for its grounding. It doesn't mention whether it actually hit the endpoint during the test, though it kind of implies that with "Silence from Google is not evidence of no fetch." Uh, if there are no requests coming in live, that means no fetch, it's using a cache of your site. It makes a difference whether it fetches a page live, or whether it's using a cached copy from a previous crawl; that tells you something about how up-to-date answers are going to be from people asking questions about your website from Gemini. But I guess the LLM writing this article just wanted to make things sound punchy an impressive, not actually communicate useful information. Anyhow, LLM marketing spam from an LLM marketing spam company. Bleh. | ||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | reincoder 19 minutes ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||
I used the same methodology to observe AI crawlers. This is not an investigative blog but is rather designed to address our (IPinfo) customers who are asking us to identify IP addresses as "AI Agents" or, more accurately, "AI Crawlers". https://community.ipinfo.io/t/can-we-detect-ai-agents-we-can... Most AI crawlers self-identify with a UA. However, Grok uses resproxies and sends a high volume of simultaneous requests. Even though we can detect resproxies, it is not possible to map these resproxy IPs to grok. I still could not figure out why I saw legitimate Googlebot IPs when I requested Perplexity to review the website. I verified those Googlebot IPs using both using UA and the listed IP address ranges published by Google. | ||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | nryoo 4 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||
So the state of AI in 2026: ChatGPT DDoS-lite, Claude the polite one that actually reads the rules, Perplexity maybe shows up, and Google was already in your house. | ||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | ctime 4 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||
Does smack of AI ness The IPs listed in the output are from reserved ranges as well, like they were intentionally obfuscated (but this was not shared with the reader). It’s the kind of obfuscation that AI would do (using esoteric bogon ranges as well) | ||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | Auburn_AI 3 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||
Interesting methodology. I've been running Claude Sonnet in production content workflows for 6 months and the pattern I notice most is that every model hits URLs in the prompt at slightly different priorities.. Claude tends to fetch top-of-message URLs first, while GPT often fetches the last one mentioned. Has anyone else seen ordering bias in which URLs get requested when multiple are in the same prompt? Would make a nice follow-up experiment if your logs have that granularity. | ||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | hajimuz 5 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||
I’m curious about the header of their requests. Something like any one of them is using text/markdown accept header? | ||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | cruffle_duffle 4 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||
I wish debates about “ai scraping my site” had more nuance. There are multiple ways these tools access your site and only one of them is “using it for training”. Others are webfetch from chat sessions, “deep research” agents, etc. And those will have different traffic patterns. They aren’t crawlers, they are clumsy, ham handed AI agents doing their humans bidding. Both can give a site the hug of death. Both can be badly coded. But there is much different intent behind the two and I feel it is important to acknowledge the difference. | ||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | dalton_zk 4 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||
You're not burning money? | ||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | worik an hour ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||
> Microsoft Copilot fetched the page as plain Chrome 135 on Linux x86_64, with a full browser-style Accept header and the usual burst of CSS, Microsoft pushing up the Linux Desktop count. I doubt that is corporate policy! | ||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | realaccfromPL 5 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||
Looks like a very fun exercise, I will try it out as well, thanks for the idea! | ||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | dawolf- 5 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||
So for the user-agent "ChatGPT-User" I can return my prompt injection text. Got it. | ||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | shermantanktop 4 hours ago | parent | prev [-] | |||||||||||||||||||||||||||||||||||||||||||||||||
This article is absolutely jammed with AI tells. Not this, but that. Here's why X matters. This matters more than that. The content is interesting, but it's delivered in an article that smells like slop. | ||||||||||||||||||||||||||||||||||||||||||||||||||