| ▲ | bovermyer 6 hours ago | ||||||||||||||||
I'm curious about what it would take to build my own "toy" search engine with its own index. Anyone ever tried this? | |||||||||||||||||
| ▲ | marginalia_nu 6 hours ago | parent | next [-] | ||||||||||||||||
Yeah that's where I started out in 2021. Been at it for almost 5 years now, last three of which full time. I'm indexing about 1.1 billion documents now off a single server. Hard part is doing it at any sort of scale and producing useful results. It's easy to build something that indexes a few million documents. Pushing into billions is a bigger challenge, as you start needing a lot of increasingly intricate bespoke solutions. Devlog here: https://www.marginalia.nu/tags/search-engine/ And search engine itself: https://marginalia-search.com/ (... though it operates a bit sub-optimally now as I'm using a ton of CPU cores to migrate the index to use postings lists compression, will take about 4-5 days I think). | |||||||||||||||||
| |||||||||||||||||
| ▲ | Gigachad 6 hours ago | parent | prev | next [-] | ||||||||||||||||
Might find YaCy interesting. It’s meant to be a decentralised search engine where users scrape the internet and can search other users indexes in a kind of torrent like way. I found it didn’t really work as a real search engine but it was interesting. | |||||||||||||||||
| ▲ | reddalo 6 hours ago | parent | prev [-] | ||||||||||||||||
Good luck scraping websites without being blocked, if you're not Google. | |||||||||||||||||
| |||||||||||||||||