| ▲ | lich_king 3 hours ago |
| It's easy to hand-curate a list of 5,000 "small web" URLs. The problem is scaling. For example, Kagi has a hand-curated "small web" filter, but I never use it because far more interesting and relevant "small web" websites are outside the filter than in it. The same is true for most other lists curated by individual folks. They're neat, but also sort of useless because they are too small: 95% of the things you're looking for are not there. The question is how do you take it to a million? There probably are at least that many good personal and non-commercial websites out there, but if you open it up, you invite spam & slop. |
|
| ▲ | nottorp 9 minutes ago | parent | next [-] |
| > The question is how do you take it to a million? Do you need to take it to a million in the same place? Is that still "small"? Why not have 2000 hand curated directories instead? |
|
| ▲ | freediver 2 hours ago | parent | prev | next [-] |
| I mainly use Kagi Small Web as a starting point of my day, with my morning coffee. Especially now when categories are added, always find something worth reading. The size here does not present a problem as I would usually browse 20-30 sites this way. |
| |
| ▲ | lich_king 2 hours ago | parent [-] | | Right, but that basically works as a retro alternative to scrolling through social media. If you're looking for something specific, it's simultaneously true that there's a small web page that answers your question and that it's not on any "small web" list because the owner of the webpage never submitted it there, or didn't meet the criteria for inclusion. For example, I have several non-commercial, personal websites that I think anyone would agree are "small web", but each of them fails the Kagi inclusion criteria for a different reason. One is not a blog, another is a blog but with the wrong cadence of posts, etc. | | |
| ▲ | freediver 2 hours ago | parent [-] | | Feel free to suggest changes to criteria for inclusion. It is mostly the way it is now as the entire project is maintained by one person - me :) | | |
| ▲ | lich_king 35 minutes ago | parent [-] | | Looking at the criteria again, I can think of at least three things that arbitrarily exclude large swathes of the small web: 1) The requirement that it needs to be a blog. There's plenty of small-web sites of people who obsess over really wonderful and wacky stuff (e.g., https://www.fleacircus.co.uk/History.htm) but don't qualify here. 2) The requirement that it needs to be updated regularly. Same as above - I get that infrequently updated websites don't generate a "daily morning" feed, but admitting them wouldn't harm in any way. 3) Blanket ban on Substack-like platforms while allowing Blogspot, Wordpress.com, YouTube, etc. Bloggers follow trends, so you're effectively excluding a significant proportion of personal blogs created in the last six years, including the stuff that isn't monetized or behind interstitials. The outcomes are pretty weird: for example, noahpinionblog.blogspot.com is on your list, but noahpinion.blog is apparently no longer small web. |
|
|
|
|
| ▲ | cosmicgadget an hour ago | parent | prev [-] |
| My approach operates under the assumption that good, non-commercial webpages will be similar to other good webpages. Slop, SEO spam, and affiliate content will resemble other such content. So a similarity-based graph/network of webpages should cluster good with good, bad with bad. That is what I've seen so far, anyway. With that, you just need to enter the graph in the right place, something that is fairly trivial. |