Remix.run Logo
chii 7 months ago

> having a text copy of all articles in their database was some legal risk

the risk should've been the same with google's index, and yet they're dandy!

I think it's more easily explained by incompetence. Esp. when stop words like 'of' and 'the' are somehow included in the index. These are almost trivial to remove prior to indexing (any decent indexing library, such as lucene, would have a prepared list of stop words filter, and it's not like you even need to do any work to have it!).

kamarg 7 months ago | parent [-]

> the risk should've been the same with google's index, and yet they're dandy!

Sure it should be but reality says Google has many more and probably better lawyers so the risk is clearly different.