| ▲ | dredmorbius 2 hours ago | |||||||
Though I'd think that you'd want to weight unaffiliated sites' anchor text to a given URL much higher than an affiliated site. "Affiliation" is a tricky term itself. Content farms were popular in the aughts (though they seem to have largely subsided), firms such as Claria and Gator. There are chumboxes (Outbrain, Taboola), and of course affiliate links (e.g., to Amazon or other shopping sites). SEO manipulation is its own whole universe. (I'm sure you know far more about this than I do, I'm mostly talking at other readers, and maybe hoping to glean some more wisdom from you ;-) | ||||||||
| ▲ | marginalia_nu 2 hours ago | parent [-] | |||||||
Oh yeah, there's definitely room for improvement in that general direction. Indexing anchor texts is much better than page rank, but in isolation, it's not sufficient. I've also seen some benefit fingerpinting the network traffic the websites make using a headless browser, to identify which ad networks they load. Very few spam sites have no ads, since there wouldn't be any economy in that. e.g. https://marginalia-search.com/site/www.salon.com?view=traffi... The full data set of DOM samples + recorded network traffic are in an enormous sqlite file (400GB+), and I haven't yet worked out any way of distributing the data yet. Though it's in the back of my mind as something I'd like to solve. | ||||||||
| ||||||||