Remix.run Logo
renegat0x0 3 days ago

Well, I created my own domain index. I have not crawled every page inside domains, but it is not my goal.

I have 1542766 domains. Might not be much, but it is an honest work.

It is available as a github repo, so anybody that wants to start crawling has some initial data to kick off.

Links

https://github.com/rumca-js/Internet-Places-Database

raybb 2 days ago | parent | next [-]

What a nice project. What inspired this initially?

FYI there's a broken link in your readme:

    https://rumca-js.github.io/internet full internet search
renegat0x0 2 days ago | parent [-]

thanks, I replaced it with a other link demo

hobs 2 days ago | parent | prev | next [-]

Cant you just request the ICANN’s zone files and have the canonical list of the day?

renegat0x0 2 days ago | parent | next [-]

Any link list, or domain list is not worth much without any rating, or meta. I lead a hobby project, and I am not expert, so I provide ratings based on what kind of data pages provide (title, social, description), and my own manual voting system. It is not ideal, but it is something. Also I provide tags, so it is easily known what the domain provides, or domains can be filtered by tags.

I know that you cannot count and visit every domain, so the list will never be finished, but I am happy with the results.

hobs 2 days ago | parent [-]

Well, if you are curating every link them its a different story, and looks like a more classic webring - I missed that part of the work - I thought it looked like a big set of crawler data that wasn't as manually curated.

egberts1 2 days ago | parent | prev [-]

Avoiding GIGO (Garbage In, Garbage Out).

This is why we have computer-variants of Library Science and Archeology, Forensic Science and a bunch of other advanced knowledge (not AI, mind you).

hobs 2 days ago | parent [-]

I don't see how this applies as its aggregating a bunch of stuff from random crawlers - if you want to crawl a list of actual domains that's generally considered the list of things that could resolve, so seems like a good starting place.

didip 2 days ago | parent | prev | next [-]

This is amazing. Thanks for sharing!

bufferoverflow 2 days ago | parent | prev [-]

[dead]