| ▲ | tokioyoyo 16 hours ago |
| Large scale scraping tech is not as sophisticated as you'd think. A significant chunk of it is "get as much as possible, categorize and clean up later". Man, I really want the real web of the 2000s back, when things felt "real" more or less... how can we even get there. |
|
| ▲ | tmnvix an hour ago | parent | next [-] |
| A curated web directory. Kind of like Yahoo had. The internet according to the dewey system with pages somehow rated for quality by actual humans (maybe something to learn from Wikipedia's approach here?) |
|
| ▲ | n1xis10t 16 hours ago | parent | prev | next [-] |
| If people start making search engines again and there is more competition for Google, I think things would be pretty sweet. |
| |
| ▲ | nephihaha 5 hours ago | parent | next [-] | | There are other search engines, they've just been marginalised. Even something as mainstream as Bing has been pushed to the side. | |
| ▲ | tokioyoyo 16 hours ago | parent | prev | next [-] | | Because of the financial incentives, it would still end up with people doing things to drive traffic to their website though, no? Maybe because the web was smaller, and people looked at it as means "to explore curiosity" in the olden days it kinda worked differently... maybe I just got old, but I don't want to believe that. | | |
| ▲ | n1xis10t 16 hours ago | parent [-] | | By “doing things to drive traffic to their website” do you mean trying to do SEO type things to manipulate search engine rankings? If so, I think that there are probably ways to rank that are immune to tampering. Don’t worry, you’re not just old. The internet kind of sucks now. | | |
| ▲ | makapuf 11 hours ago | parent [-] | | Google was neat in that you didn't see the content keyword spam either on the websites or the portal home pages. The Web was already full of shit (first ad banner was 1994? By 1999 you already had punch the monkey as classy content), but it was more ... organic and you could easily skip it. |
|
| |
| ▲ | PunchyHamster 8 hours ago | parent | prev [-] | | it's few orders of magnitude harder given the amount of SEO spam prevalent, and that just gonna get worse with AI |
|
|
| ▲ | thethingundone 16 hours ago | parent | prev | next [-] |
| I would understand that, but it seems they don’t store the stuff but recollect the same content every hour. |
| |
| ▲ | tokioyoyo 16 hours ago | parent [-] | | I'm assuming a quick hash check to see if there's any change? Between scrapers "most up to date data" is fairly valuable nowadays as well. |
|
|
| ▲ | idiotsecant 12 hours ago | parent | prev [-] |
| Have you ever listened to the 'high water mark' monologue from fear and loathing? It's pretty much just that. It was a unique time and it was neat that we got to see it, but it can't possibly happen again. https://www.youtube.com/watch?v=vUgs2O7Okqc |
| |
| ▲ | symbogra 8 hours ago | parent [-] | | Thanks for reminding me about that, what a great monologue. I didn't really understand it when I was younger, but now I feel the same thing with regards to software engineering. There was a golden age which finally broke at the end of the 2010's. |
|