| ▲ | renegat0x0 3 days ago |
| Well, I created my own domain index. I have not crawled every page inside domains, but it is not my goal. I have 1542766 domains. Might not be much, but it is an honest work. It is available as a github repo, so anybody that wants to start crawling has some initial data to kick off. Links https://github.com/rumca-js/Internet-Places-Database |
|
| ▲ | raybb 2 days ago | parent | next [-] |
| What a nice project. What inspired this initially? FYI there's a broken link in your readme: https://rumca-js.github.io/internet full internet search
|
| |
|
| ▲ | hobs 2 days ago | parent | prev | next [-] |
| Cant you just request the ICANN’s zone files and have the canonical list of the day? |
| |
| ▲ | renegat0x0 2 days ago | parent | next [-] | | Any link list, or domain list is not worth much without any rating, or meta. I lead a hobby project, and I am not expert, so I provide ratings based on what kind of data pages provide (title, social, description), and my own manual voting system. It is not ideal, but it is something. Also I provide tags, so it is easily known what the domain provides, or domains can be filtered by tags. I know that you cannot count and visit every domain, so the list will never be finished, but I am happy with the results. | | |
| ▲ | hobs 2 days ago | parent [-] | | Well, if you are curating every link them its a different story, and looks like a more classic webring - I missed that part of the work - I thought it looked like a big set of crawler data that wasn't as manually curated. |
| |
| ▲ | egberts1 2 days ago | parent | prev [-] | | Avoiding GIGO (Garbage In, Garbage Out). This is why we have computer-variants of Library Science and Archeology, Forensic Science and a bunch of other advanced knowledge (not AI, mind you). | | |
| ▲ | hobs 2 days ago | parent [-] | | I don't see how this applies as its aggregating a bunch of stuff from random crawlers - if you want to crawl a list of actual domains that's generally considered the list of things that could resolve, so seems like a good starting place. |
|
|
|
| ▲ | didip 2 days ago | parent | prev | next [-] |
| This is amazing. Thanks for sharing! |
|
| ▲ | bufferoverflow 2 days ago | parent | prev [-] |
| [dead] |