Remix.run Logo
walletdrainer 3 days ago

> Moreover, it seems like they may be serving public HTML somewhere that links to these files. As a result, hundreds are in Google search results, many containing PII

This is not how Google works.

AndroTux 2 days ago | parent | next [-]

It kind of is, though. Google doesn't randomly try to visit every URL on the internet. It follows links. Therefore, for these files to be indexed by Google, they need to be linked to from somewhere.

xtracto 2 days ago | parent | next [-]

Exactly , that's whyb"non public" github gists work. They are public, but not indexed anywhere "by default "

Barbing 2 days ago | parent | prev | next [-]

Good thing, otherwise they would have exposed countless photos via Google Photos.

Today, a photo file might be hosted at:

  photos.fife.usercontent.google.com/pw/[snip]=w[####]-h[####]-s-no-gm?authuser=0
But it used to be a little closer to:

  ...[google_site].com/[superLongAlphanumeric].jpg
And no auth required, URL only!
walletdrainer 2 days ago | parent | prev [-]

> Therefore, for these files to be indexed by Google, they need to be linked to from somewhere.

So? That’s indeed how Google works.

Google does not work how OP describes it.

I’ve investigated similar incidents in the past on other platforms, it was always user error causing links to be public.

starkrights 2 days ago | parent | next [-]

Can you actually explain why the phrase you cited from OP is wrong? You say that ~”files need to be linked to from somewhere” is correct. How is a file linked to from somewhere [on the internet] if it’s not being served on the internet that Google crawls (ie, HTML)? The only alternative is in… API calls? That Google probably isn’t crawling?

“Fiverr might be hosting public HTML somewhere” seems like an entirely reasonable alternative phrase to “these links must be linked from somewhere [that Google can crawl] “, at least to someone who is only superficially familiar with how search works.

The distinction you imply is obvious is not, and your point is thus rather confusing to someone who is not you.

walletdrainer 2 days ago | parent [-]

It’s a huge mistake to assume these links have to originate from fiverr-hosted HTML, it’s far more likely Google is finding them from places like GitHub repos used by fiverr-users.

morpheuskafka 2 days ago | parent [-]

That was my first thought, but is it logical to assume that 5+ unrelated people took their finished tax return URL and linked it on a website/tweet/etc? Who would do that?

Even still, Fiverr could very well have GDPR/CCPA/etc liability as the host of these files, because they related to its services, it's not just a generic file host.

AndroTux 2 days ago | parent | prev [-]

The only thing that's user error here is the developers of Fiverr exposing files without proper session authentication.

walletdrainer 2 days ago | parent [-]

That’s very often a deliberate design decision.

It’s bizarre UX if you link a file to someone and the link doesn’t work.

NotMichaelBay 2 days ago | parent [-]

It's actually very common to link a file hosted in the cloud to a coworker or partner and it requires login.

weird-eye-issue 2 days ago | parent | prev [-]

It's exactly how it works, pages don't just magically appear in Google's index.

You need links to pages either from your own website or backlinks from other websites. Alternatively if the page is in your sitemap then Google will typically pick it up or you can manually submit it for indexing. For important pages you would typically want internal links, backlinks, and have it in your sitemap.

walletdrainer 2 days ago | parent [-]

Google indexes links from places other than fiverr, odds are these links are mostly from places like GitHub.