| ▲ | do_not_redeem 3 days ago |
| Does "all" mean all the URLs publicly known, or did they exhaustively iterate the entire URL namespace? |
|
| ▲ | jedberg 3 days ago | parent | next [-] |
| They iterated the entire URL namespace by having volunteers run a client so they didn't get IP banned. |
| |
| ▲ | Imustaskforhelp 3 days ago | parent | next [-] | | are we sure that the whole entire URL namespace has been mapped? How would that even function, I mean, did they loop through every single permutation and see the result, or what exactly/ how would that work? | | |
| ▲ | jedberg 3 days ago | parent | next [-] | | > did they loop through every single permutation and see the result, or what exactly/ how would that work? In short, yes. Since no one can make new links, it's a pre-defined space to search. They just requested every possible key, and recorded the answer, and then uploaded it to a shared database. | |
| ▲ | toomuchtodo 3 days ago | parent | prev [-] | | The pipeline code is available for review of the mechanics of http requests made if you follow the ArchiveTeam wiki links. |
| |
| ▲ | barbazoo 3 days ago | parent | prev [-] | | Beautiful. I wish I had seen this and could have helped. | | |
|
|
| ▲ | ccgreg 3 days ago | parent | prev | next [-] |
| The goo.gl URLs that are publicly known are already in the Internet Archive and Common Crawl crawls. |
|
| ▲ | 3 days ago | parent | prev [-] |
| [deleted] |