Unfortunately the index is the easy part. Transforming user input into a series of tokens which get used to rank possible matches and return the top N, based on likely relevence, is the hard part and I'm afraid this doesn't appear to do an acceptable job with any of the queries I tested.

There's a reason Google became so popular as quickly as it did. It's even harder to compete in this space nowadays, as the volume of junk and SEO spam is many orders of magnitude worse as a percentage of the corpus than it was back then.

▲

saltysalt 2 hours ago | parent [-]

I am definitely not trying to complete with Google, instead I am offering an old-school "just search" engine with no tracking, personalization filtering, or AI.

It's driven by my own personal nostalgia for the early Internet, and to find interesting hidden corners of the Internet that are becoming increasingly hard to find on Google after you wade through all of the sponsored results and spam in the first few pages...

	▲	prophesi an hour ago \| parent [-]
		There may be a free CS course out there that teaches how to implement a simplified version of Google's PageRank. It's essentially just the recursive idea that a page is important if important pages link to it. The original paper for it is a good read, too. Curiously, it took me forever to find the unaltered version of the paper that includes Appendix A: Advertising and Mixed Motives, explaining how any search engine with an ad-based business model will inherently be biased against the needs of their users[0] [0] https://www.site.uottawa.ca/~stan/csi5389/readings/google.pd...